Content area
Due to the poor physical properties of tight sandstone gas reservoirs and complex reservoir space, the development is difficult and the final recovery is low. The main reason is that the local enrichment of residual gas is difficult to be accurately described. In this paper, the high dimensional index system affecting residual gas was established, and the main controlling factors affecting residual gas were obtained through the dimensional reduction of machine learning. Firstly, two machine learning algorithms are used to identify the main factors affecting the remaining gas. Then use it as input to perform unsupervised learning on the grid using K-means and label it. Finally, by integrating the spatial coordinate parameters of the grids, setting thresholds, and dynamically recursively searching each grid, resulting in the distribution of remaining gas types for each layer. The results show that the main factors affecting the residual gas are reserve abundance, effective thickness and pressure. In addition, the first and second layers are dominated by high residual gas reservoirs, the third layer has more high residual gas reservoirs, the fourth and fifth layers are dominated by medium residual gas reservoirs, and the sixth layer has very little residual gas.
1. Introduction
Since the first discovery of tight sandstone gas in the San Juan Basin of the United States in 1927, significant breakthroughs have been made globally [1]. With the increasing global demand for energy and the depletion of conventional oil and gas resources, tight gas has emerged as a crucial domain in the development of unconventional natural gas resources, being widely distributed across major oil and gas bearing basins worldwide, and in recent years, it has become a primary contributor to the growth in global natural gas reserves and production [2, 3]. China is rich in tight gas resources, accounting for approximately 10% of the global total, and it is the world’s third-largest producer of tight gas, thus possessing substantial potential for resource reserves and research value in this sector (Figure 1) [4, 5].
[See PDF for image]
Fig. 1
Global tight gas reserves map (a) and China’s tight gas production from 2009 to 2023 (b)
The tight sandstone gas reservoirs in the study area boast significant proven geological reserves, however, they are characterized by low porosity, low permeability, low pressure, and low productivity, making it challenging to maintain stable production in individual wells, thus posing a difficult problem for the exploration and development sector [6]. These reservoirs with low porosity and permeability have led to a considerable amount of remaining gas that has not been effectively developed and utilized, necessitating a stronger understanding of the geological characteristics of the gas reservoir and a deeper insight into the main parameters affecting the remaining gas reserves. Currently, research both domestically and internationally primarily focuses on predicting the physical parameters of tight sandstone gas reservoirs, evaluating reservoir and lithofacies classification, and classifying lithology and fluid identification; however, studies on the remaining gas in tight sandstone reservoirs are scarce.
Traditional methods for studying tight sandstone reservoirs are divided into qualitative analysis and numerical simulation. Qualitative analysis [7, 8] involves evaluating reservoir conditions, including basic characteristics such as reservoir lithology, pore space, and physical properties, combined with research on sedimentary environments andiagenetic analyses to explore the genetic mechanisms of the reservoirs in the study area, and employs various reservoir evaluation methods for comparative study. This method focuses on reservoir type and physical property analysis, and is limited by subjective experience and professional knowledge, which makes it difficult to provide accurate values and quantify risks. Numerical simulation methods [9, 10] comprehensively evaluate reservoirs by modeling fluid migration in oil and gas reservoirs, rock mechanical properties, material balance, and heat transfer processes. However, the accuracy of numerical simulation methods relies on the quality of the data, and the simulation process requires a significant amount of time, which is not conducive to rapid reservoir evaluation.
Machine learning methods primarily focus on predicting the physical properties of tight sandstone, classifying and evaluating reservoirs, quantitatively classifying and predicting lithofacies, and identifying and classifying lithology and fluids. For the prediction of physical properties, functional network (FNS), support vector machine (SVM), and random forest (RF) methods are employed for saturation prediction [11]. Porosity prediction utilizes a combined model of one-dimensional convolutional neural network gated recurrent unit (1D CNN-GRU) with logging and seismic attribute data [12]. Permeability prediction is based on convolutional neural networks (CNN) and Bayesian-regularized neural networks [13, 14]. These methods use logging parameters and seismic data as input parameters, properly configure various machine learning models, and establish prediction models for the physical properties of tight sandstone [15, 16]. These methods can capture the nonlinear relationships between various input data and physical properties, but they require high-quality and quantity data. For reservoir classification and evaluation methods, techniques such as Lasso dimensionality reduction are used to optimize logging and physical parameters as input parameters. Multi-kernel support vector machine (MK-SVM) models or machine learning gradient boosting decision tree (GBDT) algorithms are used to train and test the obtained datasets. This method can handle complex relationships between various types of data and reservoirs under high quantities, but it risks overfitting in high-dimensional and large datasets. For lithofacies classification, CNN [17, 18], unsupervised machine learning algorithms [19], and fuzzy clustering algorithms [20] are used. Rock physics and logging parameters are used as inputs to differentiate lithology into four lithofacies types: clay, sand, clayey sand, and sandy clay. This method captures the correlations in the data more comprehensively, improving prediction accuracy, but it also requires high-quality data. For lithology and fluid identification and classification, deep neural networks are combined with the oversampling method (MAHAGIL) [21]. Alternatively, an improved XGBoost method – PSO-XGBoost [22] – is used for lithology and fluid identification using logging data from tight sandstone gas reservoirs [23]. This method can better utilize various types of data to accurately identify lithology and fluids.
The aforementioned research focuses on reservoir characteristics, which is beneficial for the study of remaining gas. However, the geological conditions of tight sandstone gas reservoirs are complex, and the differences in production start times and development strategies often lead to uneven pressure drops in the reservoir strata during the middle and late stages of development, with localized enrichment of remaining gas. It is challenging to determine the influencing factors of remaining gas, the locations, and the distribution types of the remaining gas. This hinders the provision of measures for the later-stage extraction of remaining gas, resulting in low recovery rates for tight gas reservoirs.
Current research on remaining gas and remaining oil primarily focuses on experimental methods, numerical simulation, and extraction measures. For experimental methods, micro-CT and core displacement experiments [24] visually demonstrate the state of remaining oil under different waterflooding rates. A series of quantitative image processing methods and remaining oil classification methods are used to systematically evaluate and study the characteristics of remaining oil under different flow rates. Although experiments offer strong control and provide direct data, they are costly, time-consuming, and unable to consider the numerous factors affecting remaining gas. For numerical simulation, three-dimensional geological models based on single sand bodies are established using core observations, logging, production, and monitoring data. Reasonable development strategies are proposed for the extraction of remaining oil in individual sand layers during the late stage of tight sandstone reservoir development. Although numerical simulation can accurately model strata, it requires extensive data and does not reveal the main controlling factors affecting remaining gas [25]). For extraction measures, the use of existing wells and lateral drilling of horizontal wells can significantly reduce development costs. This approach effectively taps into interwell remaining reserves and improves their recovery, developing a set of drilling and completion technologies suitable for horizontal sidetracking in tight sandstone gas reservoirs [26]. Although extraction measures can directly obtain actual geological information and sample data, they are expensive, have long cycles, rely on human experience, and data acquisition and analysis are relatively difficult [27].
To sum up, the tight sandstone reservoir has been extensively studied at home and abroad, but the study of residual gas is not deep enough. It is not possible to effectively determine the location of the remaining oil and gas, and it is not possible to dynamically analyze various data to improve the recovery rate of the remaining gas. Therefore, we need an intelligent algorithm that can efficiently and accurately determine the type and location of the remaining gas. With the development of artificial intelligence, machine learning methods show good nonlinear, self-learning, adaptability and fault tolerance. They can process various types of dynamic and static data and effectively determine the main control factors affecting residual gas. The improved K-means algorithm was established to cluster the main control factors and solve the distribution type of residual gas, which provided the basis for the follow-up development measures of residual gas potential.
2. Methods
This paper employs machine learning methods (Random Forest and XGBoost) and unsupervised learning (dynamic finegrained K-means recursive method) to identify the main controlling factors and classify remaining gas through the analysis of heterogeneity, connectivity, and geological characteristics of the study area, conduct mining and analysis of remaining gas features, and accurately locate the position and distribution type of remaining gas.
2.1. Schematic diagram of the method for the main controlling factors of remaining gas
The importance of the analysis of the main controlling factors of remaining gas lies in its ability to screen out the main influencing factors affecting the distribution and reserves of remaining gas from various types of data during drilling, extraction, and development processes, effectively utilizing diverse, dynamic and static data and effectively avoiding the risk of overfitting.
Random Forest (RF) is a machine learning algorithm proposed in 2001 by Leo Breiman by combining the Bagging ensemble learning theory with the random subspace method, which is an extended variant of Bagging (Figure 2) [28, 29]. Random Forest builds on the Bagging ensemble with decision trees as the base learners, further introducing random attribute features during the training process of the decision trees.
[See PDF for image]
Fig. 2
Random Forest model
The basic idea of Random Forest regression is as follows: Firstly, k samples are extracted from the original training set using bootstrap sampling, and the sample size of each sample is the same as that of the original training set; secondly, k samples are used to establish k decision tree models, yielding k regression results; finally, the average of k decision tree results is combined [30].
Extreme Gradient Boosting (XGBoost) is a machine learning algorithm based on gradient boosting decision trees, proposed by Tianqi Chen in 2014 (Figure 3). Its principle is as follows. The model undergoes multiple rounds of iteration, and in each round, a weak classifier is produced. By training on the residuals of the previous round’s classifier, the accuracy of the next round’s classifier is improved. Finally, the weighted sum of the predictions from all weak classifiers yields the final result [31, 32].
[See PDF for image]
Fig. 3
XGBoost illustration. a) CART model; b) Illustration of an ensemble tree
The prediction process of XGBoost for the ith sample is expressed as follows:
1
where is the predicted value for the ith sample, and is the prediction result of the kth tree for the ith sample. The process is as follows: For the ith sample, the first tree predicts the second tree predicts and the predictions from a total of k trees are summed together to obtain the final predicted value for the ith sample.
In XGBoost, for regression problems, a squared loss term is constructed as the objective function. Taylor series are used to expand the objective function to address the difficulty of optimizing the function, and gradient descent algorithms are employed for optimization. The general form of the objective function in XGBoost is as follows:
2
where is the total loss function value for n samples, which can be calculated by comparing the predicted values of the samples with the true values yi of the samples using a classifier. is the regularization term, which serves to increase the penalty intensity and control the complexity of the model to avoid or reduce overfitting, thus enhancing the model’s generalization capability. The formula for the regularization term is as follows:
3
where γ, λ is the variable representing the model’s complexity, T is the number of leaf nodes, and ωj is the weight of leaf node j. By performing the Taylor expansion of the objective function in equation (2), finding the partial derivatives, and setting them equal to zero, the optimal solution corresponding to leaf node j can be obtained:
4
At this point, the objective function is:
5
2.2. Remaining gas classification model
Utilize the improved K-means method to perform fine-grained recursion on each grid based on its spatial location, then employ the dynamic fine-grained K-means recursive method to establish a remaining gas classification model, ultimately obtaining the distribution map of reserves for each layer. This provides a new approach for the later stage of development of remaining gas in tight sandstone reservoirs.
K-means. Assuming a given dataset Ai = {a1, a2,..., an}, K-means involves partitioning it into k non-overlapping sets (clusters) C = {C1,...,Ck} [33, 34]. The main idea is as follows: Initialize the center point, calculate the distance allocation cluster, and update the cluster center until the cluster center no longer changes.
New model – dynamic fine-grained K-means recursive method. The dynamic fine-grained K-means recursive method is an improved version of the K-means algorithm, which performs unsupervised learning on remaining gas at different moments using the dominant factors and spatial parameters of each grid. Firstly, at time t, randomly select data into k clusters, calculate the distance between all data and the mean of each category, assign the data to the nearest cluster center, and recalculate the mean of each category. Repeat this process until the cluster centers no longer change, and label each grid with its category. However, K-means only considers the dominant factors of each grid and does not account for the spatial location of the remaining gas. Secondly, by incorporating spatial parameters, calculate the number of each category within the domain of each grid. If a category has a count greater than or equal to 80%, assign that category (this includes grids with no remaining gas to ensure that the number of grids within a certain distance of each grid is the same). Repeat this until clustering is complete, obtaining the final distribution type of remaining gas at time t.
Given the grid dataset Ai = {xi;yi;zi;hi;Ωgi;Gi} at moment t, where the spatial parameter coordinate set is Fi = {xi;yi;zi } and the set of dominant factors is Mi = {hi;pfi;Ωgi;Gi }, then Ai = {Fi;Mi}, where i is the ith (i = 1, 2, …, n) grid, xi is the x-axis coordinate of grid i, yi is the y-axis coordinate of grid i, zi is the z-axis coordinate of grid i, hi(m) is the effective thickness of grid i, pfi is the formation pressure of grid i, Ωgi (m3/m3) is the abundance of grid i, and Gi (108 m3) is the remaining gas of grid i.
Step 1. Normalize the dataset Mi = {hi ; pfi ;Ωgi ;Gi} of grid i to facilitate the calculation of Euclidean distance in subsequent algorithms. Data normalization helps improve the quality and stability of clustering results. Taking the effective thickness hi of a certain grid as an example:
6
where max is the maximum value of the data and min is the minimum value.
Step 2. Divide all grids into k sets (j = 1, 2, …, k) and calculate the center of each set aiming to solve the objective as per formula:
7
Step 3. Calculate the distance between each grid and all cluster centers, assigning each grid to the nearest cluster center. The distance between grid Mi and cluster center is:
8
Step 4. Recalculate the cluster center for each cluster (assuming that set in a certain cluster has p samples):
9
Step 5. Repeat Step 3-4 until the division no longer changes or reaches the maximum number of iterations, obtaining cluster (j = 1, 2, …, k). For grids in different clusters, assign different colors, denoted as , where i → j indicates the ith cluster corresponding to the jth color.
Step 6. Add the coordinate set parameter to the clusters obtained from clustering to obtain a new dataset and continue the recursion based on this.
Step 7. For a grid in dataset Wi, count the categories within a certain distance. Assuming dataset Wi now has n samples, with each grid corresponding to a category the training dataset Wi can be represented as:
10
11
The distance between any two points is:
12
For any point pi, its ε – neighborhood is defined as:
13
Step 8. For a grid in dataset Wi, count the categories of other grids within its domain. By checking if the number of other categories within the grid’s domain is ≥80%, reassign the grid’s category until all grids have completed the recursion.
The flow chart presents in Figure 4.
[See PDF for image]
Fig. 4
Flowchart of dynamic fine-grained k-means recursive algorithm
The flowchart illustrates the entire process of the dynamic fine-grained K-means recursive algorithm, transforming the initially organized grid data through recursive algorithms into a visual representation of the remaining gas distribution for each layer. Below is an example to illustrate this. Figure 5 represents an 8×8×8 dataset (at time t).
[See PDF for image]
Fig. 5
Example recursive diagram at time t
As shown in Fig. 5a, no classification is performed at this time, and no spatial coordinate parameters are added. Only using the dominant factors, K-means is used to cluster the data, resulting in Fig. 5b. K-means simply divides all grids into three categories: red, green, and blue, and cannot form individual color blocks. After incorporating spatial coordinate parameters, as shown in Fig. 5c, the right lower part of the dataset mainly consists of green blocks, the upper left mainly consists of red blocks, and the back mainly consists of blue blocks. However, there are other color blocks within each color block. Through the dynamic fine-grained K-means recursive algorithm, the other color blocks within the domain of each grid are counted and then judged using a discriminant condition. After recursively processing all grids, a better classification of remaining gas categories is finally formed, as shown in Fig. 5d.
3. Example
Data from a specific tight sandstone gas reservoir were collected to study the main controlling factors and distribution types of remaining gas.
The first step is to analyze the overview and geological factors of the study area. The influencing factors are permeability, porosity, depth, formation pressure, natural gas saturation, natural gas mass density, residual gas volume, abundance and thickness.
The next step is the feature selection of remaining gas. For the nine factors, estimation models were used to evaluate each factor, ultimately ranking the importance of factors affecting remaining gas. Through comprehensive analysis, abundance and effective thickness were selected as the main controlling factors, with the consideration of the importance of formation pressure in tight sandstone gas reservoirs also making it a main controlling factor.
The final step is the dynamic fine-grained K-means recursive method for remaining gas (Figure 6). Each grid was clustered based on abundance, effective thickness, and formation pressure. After assigning a category to each grid, spatial coordinate parameters were added. First, the distribution of remaining gas reserves for each layer without recursion was obtained. Using the dynamic fine-grained K-means recursive method, the category of each grid was updated based on the number of categories within its domain, forming category blocks. Finally, the distribution of remaining gas reserves for each layer was obtained.
[See PDF for image]
Fig. 6
Research framework diagram for remaining gas at time t
3.1. Study area overview
The reservoir is located within a basin, covering an area of approximately 33·104 km2. The entire basin can be divided into five major tectonic units: the Yishan Slope in the main body of the basin, the Jinshi Flexural Belt on the eastern margin, the Tianhuan Depression in the west, the Yimeng Uplift in the north, and the Weibei Uplift in the south. It is a large, polycyclic cratonic basin characterized by overall subsidence, depression migration, and simple structure. The reservoir has a burial depth of 2600-3000 m, with a pressure coefficient of 0.4-0.5, and it contains multiple gas-bearing layers. However, the initial gas saturation is 40-60%, and the regional structure is gentle with very few faults. It mainly develops tight sandstone gas reservoirs. Core permeability is typically below 1 10–3 mm2, and porosity ranges from 7% to 10%. Due to the strong heterogeneity of the reservoir, it belongs to the typical category of tight sandstone reservoirs that are difficult to exploit. As exploration and development progress, the degree of reserve development is increasing, and the difficulty of development is growing. Therefore, research on the geological characteristics of development becomes particularly important.
3.2. Analysis of the main controlling factors of remaining gas
Main controlling factors analysis not only identifies the influential factors that significantly contribute to the distribution and classification of remaining gas but also reduces the risk of overfitting during the model fitting process. By analyzing the study area, the factors affecting remaining gas were identified, and then the XGBoost and RF algorithms were utilized to screen these factors, resulting in the final main controlling factors.
3.2.1. Dataset and estimation model evaluation
The dataset used in this study was the productivity data for 30 sublayers of a tight sandstone gas field (derived from geological model parameters), including nine features: original formation pressure, reservoir depth, permeability, gas saturation, gas quality density, gas reserves, porosity, reservoir thickness, and abundance. These nine features were used as input parameters, with the remaining gas quantity as the output parameter. Partial data is presented in Table 1.
Table 1. Various parameter values for each sublayer
Layer | Depth (m) | Porosity | Formation pressure (MPa) | Gas saturation | Gas quality density (mD) | Permeability (mD) | Thickness (m) | Abundance (m3/m2) | Remaining gas auantity (108 m3) |
|---|---|---|---|---|---|---|---|---|---|
1 | 1254.22 | 0.07 | 230.76 | 0.33 | 152.22 | 0.09 | 3.72 | 16.54 | 49.32 |
2 | 1257.79 | 0.07 | 230.81 | 0.40 | 152.25 | 0.11 | 1.97 | 13.36 | 15.42 |
3 | 1270.74 | 0.08 | 231.00 | 0.53 | 152.36 | 0.23 | 2.76 | 27.82 | 39.57 |
… | … | … | … | … | … | … | … | … | |
28 | 1510.84 | 0.05 | 234.61 | 0.48 | 154.42 | 0.36 | 0.64 | 5.52 | 0.16 |
29 | 1503.03 | 0.11 | 234.50 | 0.51 | 154.36 | 0.27 | 2.11 | 26.10 | 22.13 |
30 | 1522.12 | 0.07 | 234.79 | 0.37 | 154.52 | 0.13 | 8.58 | 48.94 | 76.99 |
To quantitatively assess the performance of the estimation models, four statistical matrices were employed: mean absolute error (MAE), mean relative error (MRE), root mean square error (RMSE), and coefficient of determination (R2). They are as follows:
14
15
16
17
3.2.2. XGBoost
The entire dataset was randomly divided into two parts, namely the training set (80%) and the test set (20%). The gradient boosting regression model with XGBoost algorithm was employed, combining multiple weak learners (decision trees) to create a powerful predictive model for regression tasks. The parameter combinations of the XGBoost model were automatically adjusted by grid search (GridSearchCV) to determine the optimal parameter combination for performance evaluation.
To investigate the impact of each parameter on remaining gas, the parameter values in the XGBoost algorithm were set and adjusted to obtain the most suitable parameter values for the model, as shown in Table 2.
Table 2. XGBoost parameter adjustment table
Parameter | Range | Step | Optimal Value |
|---|---|---|---|
n_estimators | [10,120] | 10 | 20 |
Maximum depth of the tree | [1, 10] | 2 | 3 |
min_child_weight | [1, 10] | 2 | 10 |
Learning rate | [0.01,0.3] | 0.03 | 0.1 |
For a more comprehensive comparison, the R2, mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were used as reference indicators for evaluation:
R2........................................................................0.7 |
MSE....................................................................... 24 |
MAE....................................................................... 20 |
MAPE....................................................................... 29 |
The output parameter was the remaining gas content for each influencing factor, visualized using functions from the pandas library and the sum function based on the feature importance found during the grid search in the XGBoost model.
The XGBoost algorithm can calculate the importance of each feature and then filter feature variables based on their importance, filtering out variables with significantly reduced importance through sorting. Analyzing the eight influencing factors with remaining gas yielded the final ranking of important factors affecting remaining gas content (Figure 7): effective thickness, abundance of reserves, depth, permeability, gas saturation, and porosity, with other factors having a minimal impact on remaining gas. The most important is the effective thickness, which determines the gas flow capacity and reservoir capacity of the reservoir, affects the permeability, increases the flow channel, and improves the oil and gas reservoir production. High reserve abundance slows gas release, maintains reservoir pressure, and enhances recovery and productivity.
[See PDF for image]
Fig. 7
XGBoost feature selection
3.2.3. Random Forest (RF)
RF regressor is an ensemble learning method that combines multiple decision trees for prediction. It is typically used for regression tasks where the goal is to predict continuous target variables. The results of parameter adjustment are shown in Table 3.
Table 3. Random Forest parameter values
Parameter | Range | Step | Optimal Value |
|---|---|---|---|
n_estimators | [10,100] | 10 | 70 |
Maximum depth of the tree | [1, 10] | 2 | 1 |
Minimum number of samples in a leaf node | [1, 10] | 2 | 1 |
Minimum number of samples in an internal node | [1, 10] | 2 | 3 |
For a more comprehensive comparison, R2, mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were used as reference indicators for evaluation. The results are shown below.
R2 ................................................................................ 0.5 |
MSE................................................................................ 37 |
MAE................................................................................. 33 |
MAPE................................................................................ 45 |
The model’s output end represents the impact of each parameter on the remaining gas reserves, and the results were normalized within the Random Forest model to obtain the final outcome.
The eight influencing factors were analyzed with remaining gas using RF regression, yielding the final factor ranking affecting the content of remaining gas (Figure 8): abundance of reserves, pressure, effective thickness, depth, gas saturation, and gas quality density, with other factors having minimal impact in the model. The most important factor ranked first is the abundance of reserves, which aligns with the ranking in the XGBoost model. Secondly, pressure is essential to promote the flow of remaining gas and enhance oil recovery. As a dynamic displacement mechanism, it provides energy to overcome reservoir resistance and viscosity, and affects gas-oil interactions to maintain reservoir stability and productivity.
[See PDF for image]
Fig. 8
RF feature selection
It can be concluded that reserve abundance and effective thickness are the dominant factors affecting the remaining reserves. In addition, pressure, depth, gas saturation, gas mass density, permeability and porosity are secondary influencing factors. Combined with the analysis of gas reservoir characteristics, the factors that affect gas reservoir characteristics are pressure, depth, permeability and porosity, which are all static factors that determine the amount of gas in the reservoir and the space of remaining gas in the reservoir. After consulting relevant literature, it is found that pressure has a great impact on oil and gas saturation and rock porosity, and occupies a large degree of importance in the two algorithms of this model. Therefore, it is considered as the main control factor. In summary, the most relevant influencing factors for residual gas volume are reserve abundance, followed by effective thickness and pressure.
3.3. Distribution types of remaining gas
In this section, a dynamic fine-grained K-means recursive analysis method is proposed, which first classifies each grid by the main control factors, and then recurses each grid within a certain range by combining spatial coordinate parameters. Finally, the classification and evaluation threshold of residual gas in each reservoir layer is determined, and the distribution map of residual gas in each reservoir layer is obtained.
3.3.1. Dataset
The data set used in this section is the productivity data of six layers of the same tight sandstone gas reservoir (Derived from geological model parameters), part of which is shown in Table 4 (this table shows the reserve abundance within each grid in geodetic coordinates, totaling 1.72 million effective grids). The total data includes three main controlling factors: original formation pressure, reservoir thickness, and abundance. These three features are used as input parameters, with the output parameters being the proportion diagrams of various reservoirs at each stratum and the ranges of the main controlling factors.
Table 4. Grid coordinates and dominant factors
Grid | X | Y | Z | Abundance (m3/m2) | Formation pressure (MPa) | Thickness (m) |
|---|---|---|---|---|---|---|
1 | 19394497.1701 | 4301847.5252 | 1279.0287 | 6.03 | 231.12 | 0.40 |
2 | 19394497.1701 | 4301847.5252 | 1279.4237 | 5.22 | 231.13 | 0.39 |
3 | 19394497.1701 | 4301847.5252 | 1279.8162 | 3.18 | 231.13 | 0.39 |
… | … | … | … | … | … | |
1720649 | 19388769.1571 | 4297840.6810 | 1463.2962 | 0.72 | 233.89 | 0.23 |
1720650 | 19388769.1571 | 4297840.6810 | 1466.04 | 0.58 | 233.93 | 0.23 |
1720651 | 19388769.1571 | 4297840.6810 | 1466.2687 | 0.62 | 233.94 | 0.23 |
3.3.2. Process and results of the dynamic fine-grained K-means recursive method
Using the dynamic fine-grained K-means recursive method, the distribution types of remaining gas in each layer are obtained, with layers 1 and 2 primarily consisting of high-remaining gas reservoirs, possessing high development value; Layer 3 has a relatively high number of high-remaining gas reservoirs, possessing significant development value; Layers 4 and 5 are dominated by medium-remaining gas reservoirs, with a certain level of development value; Layer 6 has a small amount of remaining gas and is not suitable for development.
Fine-grained K-means clustering. Ultimately, each layer of the tight sandstone gas reservoir is divided into three categories: red, green, and blue, representing high, medium, and low remaining gas reservoirs, respectively. The high remaining gas reservoirs have a higher reserve abundance and slightly higher formation pressure; the medium remaining gas reservoirs have a medium reserve abundance, thickness, and pressure; and the low remaining gas reservoirs have a lower reserve abundance, thickness, and pressure. The results of clustering all grids based on the main controlling factors are visualized in Figure 9.
[See PDF for image]
Fig. 9
Clustering results (red, green, blue correspond to high, medium, and low categories of remaining gas reservoirs)
The results in Fig. 9 indicate that, after clustering the grid’s main controlling factors using K-means, the remaining reserves of each layer have been marked. It is evident that the distribution of remaining gas in each layer is roughly as follows: Layers 1 and 2 are primarily composed of high-remaining gas reservoirs, with a small number of low-remaining gas reservoirs; Layer 3 has a relatively large number of high-remaining gas reservoirs, with a certain distribution of the remaining gas reservoirs. In Layers 4 and 5, there are essentially no high-remaining gas reservoirs, with medium-remaining gas reservoirs being the main type, and a small number of low-remaining gas reservoirs; Layer 6 has very little remaining gas.
Although it is possible to accurately identify the types of remaining gas, there is a problem: fine-grained K-means labels each grid with different color tags, which cannot form district tags in some areas. There are many scattered remaining gas grids and a few other tags within one label, which makes it difficult to determine the type of remaining gas in a particular area and to formulate related exploration and development measures. Therefore, it is necessary to continue recursively using dynamic finegrained K-means on this basis, making the results clearer and enabling better judgment of the location of the remaining gas type to formulate relevant plans for tapping the potential of remaining gas.
Dynamic fine-grained K-means recursion. Based on this, with the addition of spatial coordinate parameters, formula (13) is used to set the domain e = 0.7 m. The number of various color grids within the grid’s domain is counted. If the number is greater than or equal to 80% of the total number of grids in the domain, then the grid is assigned this color. This process is repeated until all grids have completed the recursion, and the results are shown in Figure 10.
[See PDF for image]
Fig. 10
Recursion results (red, green, blue correspond to high, medium, and low categories of remaining gas reservoirs)
After dynamic fine-grained K-means recursion, the results are significantly clearer and more effective, accurately determining the location of remaining gas. Layers 1 and 2 are mainly high residual gas reservoirs, which have great development potential. The upper left area and the second layer areas can be increased or promoted by opening beds. In Layers 3, the number of middle and high remaining gas reservoirs and middle remaining gas reservoirs is large, and the development potential is great, and the left, middle and other remaining gas rich locations can be opened well. Layers 4 and 5 are mainly middle class reservoirs, containing a small amount of low production reservoirs, and have general development potential. The remaining gas of Layers 6 is very small, which is basically not suitable for development.
The local magnified comparison chart between Layers 3 and 4 can provide a clearer view of the effects after recursion (Figures 11, 12).
[See PDF for image]
Fig. 11
Comparison diagram before (left) and after (right) recursion in the third layer
[See PDF for image]
Fig. 12
Local amplification comparison of various parts in the third layer
It is observed that in Fig. 11b, the relatively dispersed remaining gas is mainly classified as non-reservoir grids during recursion. In Fig. 11a, c, there are also dispersed points classified as non-reservoir grids. Moreover, in the upper left areas of Fig. 11c, there is a predominance of blue grids with relatively few red grids. After recursion, they predominantly become blue.
Comparing Figure 13a-d, it is noted that in Fig. 13a, the relatively dispersed remaining gas is mainly classified as non-reservoir grids during recursion, and the red grids within the green are recursively classified as green. In Fig. 13b, the relatively dispersed remaining gas is recursively classified as non-reservoir grids, and the blue grids within the green are recursively classified as green. In Fig. 13c, d, both contain some green grids within the blue, which, after recursion, all become blue grids.
[See PDF for image]
Fig. 13
Comparison diagram before (left) and after (right) recursion in the fourth layer
Comparison of 4 positions presented separately in Figure 14.
[See PDF for image]
Fig. 14
Recursion results (red, green, blue correspond to high, medium, and low categories of remaining gas reservoirs)
Finally, the average values of the main controlling factors for each type of remaining gas are obtained (Table 5).
Table 5. Average values of the three remaining gas categories
Main controlling factors | Category 1 | Category 2 | Category 3 |
|---|---|---|---|
Abundance (m3/m2) | 4.81 | 2.4 | 0.9 |
Thickness (m) | 0.46 | 0.4 | 0.36 |
Formation pressure (MPa) | 232.24 | 231.19 | 230.33 |
The fine-grained K-means recursive method concludes that: Layers 1 and 2 have abundant remaining gas reserves and are the key development reservoirs; Layer 3 has a relatively large remaining gas reserve and requires development in certain areas; Layers 4 and 5 have medium remaining gas reserves and have some development value. Layer 6 has no development value.
Utilizing the dynamic fine-grained K-means recursive algorithm, the distribution type of remaining gas in each layer of the block is obtained, providing insights for subsequent development processes. For example:
Adding Wells: Wells should be added in areas rich in remaining gas that are not covered by the control radius of other wells. This mainly considers the positions with a higher distribution of remaining gas in Layers 1, 2, and 3, and the necessary areas for adding wells are determined by comparing the well locations and control radii.
Opening Strata: In areas where wells have already been drilled, the corresponding strata should be opened to increase recovery rates. This mainly considers the distribution of types 1 and 2 remaining gas in Layers 1, 2, 3, 4, and 5, and the necessary strata to be opened are determined by comparing the well location and control radius diagrams.
By studying the distribution of remaining gas types using the K-means and dynamic fine-grained K-means recursive algorithms, although K-means can finely judge the remaining gas types, it cannot form unified tags for certain regions. The dynamic fine-grained K-means recursive algorithm, based on this, clearly and effectively determines the distribution of remaining gas types. Analysis indicates that Layers 1-3 have good development value and require well additions or strata openings in some areas. Layers 4-5 have some development value, with the potential to open certain strata to increase recovery rates. However, layer 6 has no development value. Furthermore, the conclusions also align with the results of the previous geological analysis of the gas reservoir, with Layers 1, 2, and 3 exhibiting poor physical properties, strong heterogeneity, and relatively high connectivity, resulting in a greater amount of remaining gas.
4. Conclusions
This study proposes for models with diverse data types and large volumes of data, to first utilize feature selection for the screening of dominant factors. This is achieved through the nine influencing factors of depth, porosity, pressure, gas saturation, gas quality density, permeability, remaining gas quantity, thickness, and abundance for each sublayer, using XGBoost and RF for the screening of dominant factors. Additionally, this paper introduces a novel algorithm, the dynamic fine-grained K-means recursive algorithm, which clusters remaining gas based on the dominant factors combined with their geodetic coordinates, forming a storage distribution map for each layer and establishing standards for the classification of tight gas reservoirs. The main results are summarized as follows:
Through the analysis of heterogeneity and connectivity in the geological analysis of the gas reservoir, permeability, porosity, depth, and pressure are identified as factors affecting remaining gas. Combined with the geological characteristics of the study area, gas saturation, gas quality density, thickness, and abundance are also identified as influencing factors.
XGboost and RF can accurately screen out the main factors affecting the distribution of remaining gas. With machine learning feature selection as the main approach and expert experience assessment as a supplement, the dominant factors are identified as reserve abundance, effective thickness, and formation pressure. The model has a high accuracy rate, low errors of various types, and a good fitting effect.
A dynamic fine-grained K-means recursive method is proposed, clustering the dominant factors in the form of grids at each stratum. This method, combined with spatial coordinate parameters and unsupervised learning for training the distribution of remaining gas, formation of remaining gas classification types after recursion. Ultimately, the ranges and distribution maps of the dominant factors for the three types of remaining gas in each stratum are obtained. The results indicate that Layers 1 and 2 are the focus of development, Layers 3, 4, and 5 require redevelopment in certain areas, and Layer 6 has no development value.
References
1. Zhen-Xue, Jiang; Zhuo, Li; Feng, Li et al. Tight sandstone gas accumulation mechanism and development models[J]. Petroleum Science (English); 2015; 4, pp. 587-605.
2. Jiongfan, Wei; Jingong, Zhang; Zishu, Yong. Characteristics of Tight Gas Reservoirs in the Xujiahe Formation in the Western Sichuan Depression: A Systematic Review[J]. Energies; 2024; 17,
3. Shuyun, Liu. Review of the Development Status and Technology of Tight Oil: Advances and Outlook[J]. Energy & Fuels; 2023; 37,
4. Jinxing, Dai; Ni Yunyan, Wu; Xiaoqi. ,. Tight gas in China and its significance in exploration and exploitation[J]. Petroleum Exploration and Development; 2012; 39,
5. Jia Ailin, Wei Yunsheng, Guo Zhi .et al. Development status and prospect of tight sandstone gas in China[J]. Natural Gas Industry B, 2022, 9(5): 467-476.
6. Lei, Jiang; Ronghai, Zhou; Gang, Li. Accumulation characteristics of tight sandstone gas reservoirs in the Upper Paleozoic in the Shenfu block of the Ordos Basin[J]. IOP Conference Series: Earth and Environmental Science; 2021; 768, 012022.
7. Zhang Huiliang, Zhang Ronghu, Yang Haijun et al. Characterization and evaluation of ultra-deep fracture-pore tight sandstone reservoirs: A case study of Cretaceous Bashijiqike Formation in Kelasu tectonic zone in Kuqa foreland basin, Tarim, NW China[J].Petroleum Exploration and Development, 2014, 41(2): 175-184.
8. Zhu Haihua, Zhang Tingshan, Zhong Dakang.et al. Binary pore structure characteristics of tight sandstone reservoirs[J]. Petroleum Exploration and Development,2019, 46(6): 1297-1306.
9. Jiang Lin, Zhao Wen, Bo Dong-Mei et al. Tight sandstone gas accumulation mechanisms and sweet spot prediction,Triassic Xujiahe Formation,Sichuan Basin,China[J].Petroleum Science,2023,(6): 3301-3310.
10. Cao Gaohui, Lin Mian, Zhang Likuan.et al. Numerical simulation of the dynamic migration mechanism and prediction of saturation of tight sandstone oil[J]. Science China Earth Sciences, 2024, 67 (1): 179-195.
11. Ibrahim, Ahmed Farid, Elkatatny Salaheldin Mahmoud, Abdelraouf Yasmin A. et al.Application of Various Machine Learning Techniques in Predicting Water Saturation in Tight Gas Sandstone Formation[J].Journal of Energy Resources Technology, Transactions of the ASME, 2022, 144(8): 083009.
12. Shi Su-Zhen, Shi Gui-Fei, Pei Jin-Bo.et al. Porosity prediction in tight sandstone reservoirs based on a one–dimensional convolutional neural network–gated recurrent unit model[J]. Applied Geophysics, 2023: 1-13.
13. Li, Hui; Qiu, Bo; Zhang, Yonghao et al. CNN-Based Network Application for Petrophysical Parameter Inversion: Sensitivity Analysis of Input-Output Parameters and Network Architecture[J]. IEEE Transactions on Geoscience and Remote Sensing; 2022; 60, pp. 1-13.
14. Yanqiu, Zhou; Xiaoqing, Zhao; Chengzhou, Jiang et al. Permeability prediction of multi-stage tight gas sandstones based on Bayesian regularization neural network [J]. Marine & Petroleum Geology; 2021; 133, [DOI: https://dx.doi.org/10.1016/j.marpetgeo.2021.105320] 105320.
15. Xuefei, Lu; Xing, Xin; Kelai, Hu et al. Classification and Evaluation of Tight Sandstone Reservoirs Based on MK-SVM[J]. Processes; 2023; 11,
16. Longfei, Ma; Hanmin, Xiao; Jingwei, Tao et al. An intelligent approach for reservoir quality evaluation in tight sandstone reservoir using gradient boosting decision tree algorithm[J]. Open Geosciences; 2022; 14,
17. Jing-Jing Liu, Jian-Chao Liu. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs[J].Geoscience Frontiers, 2022, 13 (1): 356-369.
18. Liu, Fang; Wang, Xin; Liu, Zongbao et al. Identification of tight sandstone reservoir lithofacies based on CNN image recognition technology: A case study of Fuyu reservoir of Sanzhao Sag in Songliao Basin[J]. Geoenergy Science and Engineering; 2023; 222, 1:CAS:528:DC%2BB3sXlsVeqtrk%3D [DOI: https://dx.doi.org/10.1016/j.geoen.2023.211459] 211459.
19. Xiaobo Zhao, Xiaojun Chen,Wen Chen, et al. Quantitative Classification and Prediction of Diagenetic Facies in Tight Gas Sandstone Reservoirs via Unsupervised and Supervised Machine Learning Models: Ledong Area, Yinggehai Basin[J]. Natural Resources Research, 32(6): 2685-2710.
20. Cherana Amina, Aliouane Leila, Doghmane Mohamed Z., et al. Lithofacies discrimination of the Ordovician unconventional gas-bearing tight sandstone reservoirs using a subtractive fuzzy clustering algorithm applied on the well log data: Illizi Basin, the Algerian Sahara [J]. Journal of African Earth Sciences, 2022, 196: 104732.
21. He Mei, Gu; Hanming, Wan Huan. Log interpretation for lithology and fluid identification using deep neural network combined with MAHAKIL in a tight sandstone reservoir [J]. Journal of Petroleum Science & Engineering; 2020; 194, [DOI: https://dx.doi.org/10.1016/j.petrol.2020.107498] 107498.
22. Cherana Amina, Aliouane Leila, Doghmane Mohamed Z. et al. Lithofacies discrimination of the Ordovician unconventional gas-bearing tight sandstone reservoirs using a subtractive fuzzy clustering algorithm applied on the well log data: Illizi Basin, the Algerian Sahara [J]. Journal of African Earth Sciences, 2022, 196:104732.
23. Longfei, Ma; Hanmin, Xiao; Jingwei, Tao et al. An intelligent approach for reservoir quality evaluation in tight sandstone reservoir using gradient boosting decision tree algorithm[J]. Open Geosciences; 2022; 14,
24. Cheng Baoyang, Li Junjian, Jiang Shuai, et al. Pore-Scale Investigation of Microscopic Remaining Oil Variation Characteristic in Different Flow Rates Using Micro-CT [J]. Energies, 021, 14(11): 3057.
25. Gang, Hui; Shengnan, Chen; Youjing, Wang et al. The Effect of Hydraulic-Natural Fracture Networks on the Waterflooding Development in a Multilayer Tight Reservoir: Case Study [J]. Geofluids; 2021; 2021, pp. 1-15.
26. Jinwu, Zhang; Guoyong, Wang; Kai, He et al. Practice and understanding of sidetracking horizontal drilling in old wells in Sulige Gas Field, NW China[J]. Petroleum Exploration and Development; 2019; 46,
27. Guang, Ji; Ailin, Jia; Dewei, Meng et al. Technical strategies for effective development and gas recovery enhancement of a large tight gas field: A case study of Sulige gas field, Ordos Basin, NW China[J]. Petroleum Exploration and Development; 2019; 46,
28. Hrishikesh K. Chavan, Rajib K. Sinharay, Vijay Kumar, et al. An approach of using machine learning classification for screening of enhanced oil recovery techniques[J]. Petroleum Science and Technology, 2023: 1-19.
29. Sungil Kim, Kwang Hyun Kim, Jung-Tek Lim. Synergistic enhancement of productivity prediction using machine learning and integrated data from six shale basins of the USA[J].Geoenergy Science and Engineering, 2023, 229: 212068.
30. Liqiang Pei, Jinyuan Shen, Runjie Liu. Deep Feature of Image Screened by Improved Clustering Algorithm Cascaded with Genetic Algorithm[A]. 29th China Control and Decision-making Conference [C], 2017-05-28.
31. Zhao, Wang; Hongming, Tang; Yawei, Hou et al. Quantitative evaluation of unconsolidated sandstone heavy oil reservoirs based on machine learning [J]. Geological Journal; 2022; 58,
32. Jiyuan Zhang;Yanchun Sun;Lin Shang.A unified intelligent model for estimating the (gas + n-alkane) interfacial tension based on the eXtreme gradient boosting (XGBoost)trees[J]. Fuel, 2020, 282:118783.
33. Zhang, Jiyuan; Feng, Qihong; Zhang, Xianmin. A Supervised Learning Approach for Accurate Modeling of CO2-Brine Interfacial Tension with Application in Identifying the Optimum Sequestration Depth in Saline Aquifers[J]. Energy & Fuels; 2020; 34,
34. Yang, Li; Tingshan, Zhang; Zongyang, Dai et al. Quantitative evaluation methods of tight reservoirs based on multi-feature fusion: A case study of the fourth member of Shahejie Formation in Liaohe Depression(Article)[J]. Journal of Petroleum Science and Engineering; 2021; 198, [DOI: https://dx.doi.org/10.1016/j.petrol.2020.108090] 108090.
© Springer Science+Business Media, LLC, part of Springer Nature 2025.