Content area
The verification of mathematical models for multistage reciprocating compressors is crucial for ensuring their accuracy and reliability. In this study, we used different machine learning (ML) models to verify the results of MATLAB-based models of single-stage reciprocating compressors, multistage reciprocating compressors without intercoolers, and multistage reciprocating compressors with intercoolers to simulate the real-world operating conditions of a reciprocating compressor. The verification focuses on key performance indicators, such as the pressure–volume (PV) graph, outlet temperature graph, volumetric efficiency, and pressure ratio graph. The MATLAB model computes thermodynamic parameters, such as the power required, outlet pressure, and outlet temperature for various operating conditions. The MATLAB model produced the following results for single-stage compressor: the outlet pressure increased by 1.6 times the inlet pressure of the compressor, the volume reduced by 20% of the volume at the inlet of the single-stage compressor, and the outlet temperature increased by 30% of the inlet temperature. In the case of a multistage compressor without an intercooler, the outlet pressure increased by about 3.3–3.6 times the inlet pressure of the compressor; the volume reduced by 60% of the volume at the inlet, and the outlet temperature increased by 35% in comparison to the inlet temperature of the multistage compressor without an intercooler. Subsequently, in the case of a multistage compressor with an intercooler at the first stage of compression, the pressure increased by three times the inlet pressure; at the second stage of compression, the pressure increased by six times the inlet pressure of the compressor, the volume was reduced by approximately 80%, and the intercooler maintained the increase in outlet temperature by 30%, limiting it and preventing excessive expansion of air in the compressor and increasing the efficiency of the compressor by 12% in comparison to the multistage compressor without an intercooler. In addition, the results generated by all the machine learning models used in the study were in correlation with the results generated by the MATLAB model for all three compressors, with an accuracy of approximately 90% or more for almost all the models implemented for prediction. By comparing the predicted outputs from the ML model with the MATLAB-generated results, the accuracy and consistency of the simulation were assessed. This study aims to bridge the gap between traditional mathematical modeling and modern data-driven validation techniques to ensure robustness in compressor performance predictions.
Introduction
Multistage reciprocating compressors, which have particular applications in refrigeration, gas transport, and process engineering, require mathematical modeling to optimize their performance, predict their efficiencies, and analyze them for reliability. They work based on the thermodynamic principles of the ideal gas as well as polytropic and isentropic processes, with conservation of internal energy loss due to friction, heat transfer effects, dynamic effects of valves, and pressure drop in sucking, compressing, and discharging the gas. Mathematical modeling of reciprocating compressors is typically based on fluid dynamics with thermodynamics; the models proposed to date describe mass flow rate equations, heat transfer, and pressure–volume (PV) relations. For example, Hsieh et al. [4] performed theoretical modeling of oil-injected screw compressors by suitably mixing the principles of conservation of mass and energy with the concepts of heat transfer. Sjöstedt [13] described the use of MATLAB/Simulink to perform simulations of displacement compressors, arguing for the advantages of this visual programming approach in performance prediction and optimization. In addition, Kovacevic et al. [7] expressed CFD as an extension research tool to investigate in-compressor dynamics for screw compressor design optimization. However, owing to its very long computation time, it may not be an option for real-time optimization.
Emerging ML methods have generated a novel ground for compressor performance prediction and optimized performance assessment [9]. Gaussian process regression (GPR) and Bayesian optimization have proven to be the most beneficial for their predictability [6, 8]. In predictions based on key geometric parameters of a screw compressor, Kumar et al. [8] demonstrated the use of GPR and compared the prediction capacity with polynomial regression and neural networks. Successful optimization has focused on reducing the computation time while maintaining the accuracy of the high-pressure compressor models [10]. Fault detection was accomplished using condition-monitoring techniques, such as motor current signature analysis (MCSA) in reciprocating compressors [3]. The MCSA model for a two-stage reciprocating compressor to detect faults, such as valve leakage and stator winding asymmetry, was developed by Haba et al. in 2018.
The interface of ML with traditional modeling has undergone extended advancement in the performance analysis of compressors working in the air-conditioning domain. Their studies demonstrated that ML, coupled with various feature fusion procedures, can accurately diagnose faults in air compressors [11]. The newer approach for SVM methods parameterized through a genetic algorithm presented by Zhong et al. [15] deals with the prediction of compressor performance. This method incorporates the coupling of linear and cubic spline interpolation for the sake of training samples, first screening of the SVM kernel, and genetic algorithms for SVM parameter optimization. The effectiveness of ML-based predictive maintenance systems based on ML is investigated in various industries, including centrifugal and axial flow compressors [1, 12]. Specific examples include the research offered by Achouch et al. [1] on Industry 4.0, machine-learning algorithms for fault detection and maintenance prediction of TA-48 multistage centrifugal plant compressors at the edge of failure, and under downtime minimization by means of RUL estimates using LSTM neural networks for failure prediction. In the backdrop of contemporary smart techniques, Pakatchian et al. [12] compared comparisons in relation to the effects of ML applications on the aerodynamic behavior simulation of axial flow compressors and on performance prediction from numerical simulations. According to Hao et al. [5], artificial neural networks and parameter estimation methods were then compared to predict the characteristic curves of a PG9351FA gas turbine, casting doubt on both method accuracy and range of applicability. Ying et al [14] showed that compressor performance model generation could be undertaken by nonlinear regression of support vector machine algorithms associated with characteristic map generation. Aminzadeh et al. [2] discussed predictive maintenance and monitoring of industrial compressors through ML with a system integrating advanced data acquisition and ML algorithms.
In the present investigation, a MATLAB simulation was deployed to model a single-stage reciprocating compressor, multistage reciprocating compressor, and multistage reciprocating compressor coupled with an intercooler. Thermodynamic equations were included and simulated under practical conditions. The modeled results were then plugged into various ML models, such as linear regression with polynomial features, decision tree regressor, random forest regressor, support vector regressor, and XGBoost for quantitative analysis. A predictive performance metric graph, such as the pressure–volume (PV) diagram, outlet temperature–time graph, or volumetric efficiency–time graph, is predicted from the above ML models. These graphs were selected to summarize the compressor performance and classify the compressor types, providing information about efficiency.
The structure of this paper is as follows: the abstract outlines the research that has been carried out; the introduction provides the background of the study and mentions how the paper is organized. The literature review discusses relevant research on the mathematical and ML-based modeling of compressors. The methodology describes the development of MATLAB models and validates them using theoretical calculations and ML-based frameworks. The next section describes how the simulation process generates a dataset under different operating conditions, followed by the ML model development phase. Finally, the results and comparison section deals with the performance assessment of the models, putting forth an argument that multistage reciprocating compressors with intercoolers are superior in efficiency. Therefore, the current study proves that data-driven approaches can validate compressor mathematical models, thus making ML viable for improving the modeling accuracy and predictive capabilities.
Methodology
MATLAB model development
Mathematical modeling provides the background for engineering and scientific studies based on numerical techniques and equation-based analyses operating in complex structures. These models are built on the laws of physics and empirical data to create a structured representation of the behavior of the system. In present-day mechanical engineering fields, mathematical modeling often plays a significant role in structural analysis, thermodynamic optimization, fluid mechanics, and control system design. From this perspective, the substantial capability of MATLAB offers effective functions in the realms of numerical analysis, data visualization, and simulation. The benefits of using graphical modeling in Simulink are appreciated in control system design and dynamic simulation. Thus, MATLAB is a subject of strong dependence in various engineering applications, such as vibration analysis, heat transfer modeling, and aerodynamics, all of which enable reducing the cost of evaluation and time for development.
The integration of artificial intelligence with computational intelligence combined with classical numerical techniques in MATLAB has recently cast a new light on prediction and decision-making models. This combination of artificial intelligence and MATLAB increases engineering simulations with precision and efficacy, thereby promoting growth in data-driven work. Ascending value on the escalating plane of the digital industry is the domain of robotics, biomechanics, renewable energy, and smart manufacturing, facilitating design optimization with increased sustainability.
This research calls for the creation of comprehensive mathematical models of single-stage reciprocating, multistage reciprocating, and multistage reciprocating with an intercooler, using simulations in MATLAB Simulink. These models use computational numerations to realistically portray the thermodynamic and mechanistic aspects of the fly during compression, giving a real evaluation of the compressor performance at full tilt under various operational conditions. Figure 1 illustrates the MATLAB model of multistage reciprocating compressor that is used for the study. MATLAB runs with other theoretical and experimental models, thereby stressing the path towards data-driven topologies within engineering research and industry.
[See PDF for image]
Fig. 1
MATLAB model of multistage reciprocating compressor
The MATLAB Simulink model arranged the essential components that simulate the dynamics of the compressor and its performance. The solver configuration incorporates the establishment of simulation parameters whose numerical solver options are appropriate and completes the reliability and stability of the model. The physical signal ramp allows the time-dependent variations in system behavior to be captured, allowing the dynamic operation of the compressor under different loads to be simulated. In the model, the constant-volume chamber lies at the heart of the model, continuously simulating the mass and energy storage of the gas network. It is crucial to determine the pressure and temperature changes that occur during compression. In addition, the positive-displacement compressor block translates the reciprocating mechanics of the compressor, including the piston movement through its phases of intake, compression, and gas discharge. This component is critical for simulating the operational functionality of reciprocating compressors. To represent the rotor dynamics accurately for the compressor, the Mechanical Rotational Reference and the Angular Velocity Source models were attached to a part of the model. These components define the rotor dynamic behaviors such that the speed variation is well modeled for a precise simulation of the compressor performance. In addition, this perfect insulator accounts for thermally insulated boundaries and hence minimizes unwanted heat losses during operation, increasing the overall accuracy of the model results. The key input parameters for this simulation model are the rotation speed (in revolutions per minute), inlet pressure, and inlet temperature, which in turn are critical factors in determining the performance characteristics of the compressor. The primary outputs from such a simulation, with a potential summary of the results included, comprise the power demand for compression, outlet pressure for each stage of compression, and outlet temperature of the compressed gas. All of these components and parameters form the basis for a complete and realistic simulation of reciprocating compressor operation, providing valuable performance feedback under various operational conditions that could also be used for further optimization and analysis in research and real life.
Mathematical formulation
The Simulink model is built upon the fundamental thermodynamic principles that govern the operation of a reciprocating compressor. The key performance indicators are calculated using the following governing equations.
The mass flow rate (m˙) is a function of the volumetric efficiency (ηv), inlet gas density (ρ1), cylinder displacement volume (Vd), and crankshaft rotational speed (N).
The volumetric efficiency (ηv), which accounts for the impact of clearance volume on the gas flow, is determined by the clearance ratio (C) and the pressure ratio across the compressor.
The final discharge temperature (T4) after polytropic compression is calculated based on the inlet temperature and the pressure ratio. This relationship is crucial for assessing the thermal load on the compressor.
Finally, the theoretical power consumption (Pth) required for the compression process, assuming isentropic compression, is calculated as follows:where the variables are defined as follows:
C: Clearance ratio
k: Specific heat ratio
n: Polytropic index of compression
N: Rotational speed (rpm)
P1: Inlet pressure
P3: Final discharge pressure
Pth: Theoretical power consumption
R: Specific gas constant
T1: Inlet temperature
T4: Final discharge temperature
Vd: Displacement volume
m˙: Mass flow rate
ηv: Volumetric efficiency
ρ1: Inlet gas density
These equations constitute the core logic implemented within the MATLAB Simulink environment to simulate the compressor’sperformance.
Model validation
Such validation of the MATLAB model was achieved via a machine learning-based methodology, which was then used to predict the predicted performance metrics through the validity of these metrics with respect to the performance parameters of the compressor. The performance parameters that are generated include PV diagrams, outlet temperature–time graphs, and graphs of volumetric efficiency vs. pressure ratio under input conditions identified in the process of application of various machine learning algorithms, such as polynomial features linear regression, decision tree regressor, random forest regressor, support vector regressor, and XGBoost. Finally, comparisons are made regarding the end products from the aforementioned sources and MATLAB to check the consistency and accuracy of the different machine models on compressor behavior prediction. This method further enhances the predictively reliable aspect of data-driven validation as it demonstrates how machine modeling can bring value when modeling complex thermodynamics and mechanical processes in reciprocating compressors.
Simulation and data generation
Compressor performance studies were performed using various simulations of the three types of reciprocating compressors under different operating conditions. The results assist researchers in assessing the efficiencies of compressors and their performances under varying load conditions. Data columns such as the inlet pressure, inlet temperature, RPM, pressure ratio, and volumetric efficiency were utilized. All datasets were fed into various machine-learning models for training purposes.
Performance metrics
The graphs were analyzed for a single-stage compressor, a multistage compressor without an intercooler, and a multistage compressor with an intercooler to study the trend in the efficiency and compression graphs at different stages and conditions of the compression cycle. The key performance parameters should be graphically represented to understand the thermodynamic behavior of reciprocating compressors when the operational aspects are varied.
Pressure–volume diagrams
Figure 2 shows a complete visualization of the pressure change with respect to the volume of the cylinder at different compression stages, with the work done in the compression process itself to examine the efficiency of the compressor cycle. PV diagrams assist in quantifying energy losses, assessing process adherence to ideal compression processes, and improving performance through design enhancements.
[See PDF for image]
Fig. 2
PV (pressure–volume) diagram given by the MATLAB model
Outlet temperature vs. time graph
Figure 3 depicts the variations in the outlet temperature over time, which makes it possible to investigate the transient thermal behavior of reciprocating compressors at disparate loads. This further allows for determining temperature changes and establishing heat build-up, cooling effectiveness, and thermal shocks sustained by compressor components, further allowing engineers to better design work regarding cooling dissipation means and lubrication modes for greater reliability over extended periods of operation. They are of utmost importance in the conditioning of the thermal behavior of reciprocating compressors. Next, above the passage of time, they reflected up to the temperature at the outlet of the compressor and showed the temperature changes to understand the heat build-up, cooling effectiveness, and thermal shocks to which the components in the compressor were subjected. This affords improved design work for engineers concerning heat-dissipation mechanisms and lubrication modes for overall robust reliability under extended periods of operation.
[See PDF for image]
Fig. 3
Outlet temperature vs time graph given by the MATLAB model
Volumetric efficiency vs pressure ratio
The ηᵥ-PR relationship demonstrated in Fig. 4 is critical for compressor performance evaluations regarding intake efficiency, power consumption, and thermodynamic behavior. As the pressure ratio increased, more heat was transferred to the air being compressed to maximize the re-expansion of the residual gas in the cylinder. This, in turn, reduces the volume available for fresh intake air, leading to lower ηᵥ and limiting volumetric flow per cycle. After some optimal pressure ratio, the efficiency curve starts to descend rapidly with the worst efficiency scenario commencing when gas re-expansion comes into play, and there are fewer intakes.
[See PDF for image]
Fig. 4
Volumetric efficiency vs pressure ratio graph given by the MATLAB model
These efficiencies influence the work appraisal and energy input of the compressors. Higher ηᵥ at a given PR would lower work input and hence improve energy efficiency; in contrast, a decline in ηᵥ at higher PRs would increase mechanical work input, energy losses, and thermal inefficiencies. Enhancing ηᵥ can be achieved through precompression intercooling, advanced cylinder geometry, and valve timing optimization. Comprehension of the ηᵥ and PR relationship leads to energy-saving designs of compressors that require less power and grant high-energy input under high pressure, thereby increasing the application’s overall efficiency in all industrial settings.
The η-PR relationship is critical in performance appraisals of the intake efficiency, power consumption, and thermodynamics of the compressor. With an increasing pressure ratio, more heat is imparted to the compressed air to maximize the re-expansion of the residual gas in the cylinder. This directly reduces the volume available for fresh intake air, resulting in a lower η with a limiting volumetric flow per cycle. Beyond an optimal pressure ratio, the efficiency curve begins to decline rapidly with the worst efficiency case when gas re-expansion, together with less intake, comes into play.
These efficiencies may have a bearing on the work performed and energy input to the compressors. Higher efficiencies correspond to energy savings, keeping the other factors constant. Thus, a higher η at a given PR means less work is done and higher-energy efficiency, whereas a low η for a higher PR means paying extra for mechanical work, bearing extra energy losses, and incurring thermal inefficiencies. The methods of enhancing η under consideration may include pre-compression intercooling, advanced cylinder geometry, and valve timing optimization. Understanding the η vs. PR relationship leads to energy-efficient designs of compressors with low-power requirements and high-energy input with high pressure, thereby improving the overall application efficiency in all industrial settings.
Machine learning model development
The machine learning algorithms used for prediction in this study were linear regression (with polynomial features), decision tree regressor, random forest regressor, support vector regressor, and XGBoost regressor. Before applying these algorithms directly on the dataset, we need to implement preprocessing steps on the dataset. The data structure of the dataset is examined first using the [df.info()] function of the pandas library of the Python programming language, which gives an overview of the dataset, including the column names and number of non-null values existing in the dataset. Next, the [df.describe()] function of the same library is used, which generates the statistical summary of the dataset. Then, the [df.isnull().sum] function was incorporated to obtain the total number of null values present in each column of the dataset (Figs. 5 and 6). The second step in the preprocessing framework is the identification of outliers in the dataset using the boxplot visualization of the interquartile method. This method includes boxplots for feature visualization. Outliers appear as individual points beyond whiskers in the boxplot. In Fig. 7, the box represents the middle 50% of the data, which represents Q1 (Quartile 1) to Q3 (Quartile 3) of the dataset, and the whiskers show a range within 1.5 times the IQR (interquartile range) from Q1 and Q3. The preprocessing step comprises feature scaling, which is a critical step in the preprocessing section of machine learning and data analysis that transforms numerical features into a standard range to ensure that all features contribute equally to the model. This prevents features with large values from dominating the model. This is a necessary step because it prevents bias due to large-scale differences; it also speeds up convergence in gradient descent and improves distance-based algorithms such as SVM, KNN, PCA, and K-means clustering, which rely on the Euclidean distance, which is affected by differences in feature magnitude. To understand the type of feature scaling required for the dataset histogram for all features of the dataset, we plotted the type of distribution possessed by the data. The standard scalar function was used here because the data follow a normal distribution for almost all columns of the dataset with skewness ranging from −1 to +1 for most of them, as can be observed in Fig. 8. The same trend in histograms was analyzed for multistage compressors with and without intercoolers; therefore, the same feature scaling function is for them.
[See PDF for image]
Fig. 5
Illustration of all the datasets, machine learning models and prediction graphs for study of reciprocating compressors
[See PDF for image]
Fig. 6
Complete Machine Learning workflow for the prediction of performance parameters of reciprocating compressors
[See PDF for image]
Fig. 7
Boxplot for outlier detection of dataset for single stage reciprocating compressor
[See PDF for image]
Fig. 8
Histogram analysis for feature scaling of single stage reciprocating compressor
Following feature scaling, feature selection was performed using the correlation matrix heatmap in Fig. 9 to determine the correlations between different features in the dataset. Following feature scaling, a systematic feature selection process was conducted to identify the most influential predictors for the model. The primary operational parameters available from the simulation data—namely, inlet pressure, inlet temperature, and rotational speed (RPM)—were considered as input features to predict the various output parameters (e.g., outlet pressure, outlet temperature, power required). To quantify the relationships between these features and the target variables, a Pearson correlation matrix was generated and visualized as a heatmap (as seen in Fig. 9). Features demonstrating a strong correlation with the target outputs were selected for model training. This ensures that the models are built upon the most impactful predictors, which reduces model complexity and mitigates the risk of including redundant or irrelevant variables. The final curated dataset was then split into a training set (80%) and a testing set (20%) to facilitate model training and subsequent evaluation. To optimize the predictive performance and ensure the robustness of the data-driven models, a systematic hyperparameter tuning process was employed. We utilized a grid search with fivefold cross-validation for each algorithm, with the exception of the standard polynomial regression model. The objective of the grid search was to identify the combination of hyperparameters that yielded the highest coefficient of determination (R2) on the validation folds. This process ensures that the selected models are not only accurate but also generalize well to new data. The specific hyperparameters and the range of values explored for each model are detailed in Table 1.
[See PDF for image]
Fig. 9
Correlation heatmap for multistage reciprocating compressor with intercooler
Table 1. Configuration of the Hyperparameter Grid Search for Each Mode
Model | Hyperparameter | Values Searched (Grid) |
|---|---|---|
Polynomial Regression | Polynomial Degree | 3 (fixed) |
Decision Tree | Max depth | [5, 10, None] |
Min samples split | [2, 5, 10] | |
Random Forest | N estimators | [50, 100, 200] |
Max depth | [5, 10, None] | |
Support Vector Regression | C (Regularization) | [0.1, 1, 10] |
Kernel | ['linear', 'rbf'] | |
Epsilon | [0.01, 0.1, 0.2] | |
XGBoost Regressor | N estimators | [50, 100, 200] |
Learning rate | [0.01, 0.1, 0.2] | |
Max depth | [3, 6, 9] |
In this study, different machine learning algorithms were employed to forecast some of the most significant performance parameters of reciprocating compressors, including pressure–volume (PV) diagrams, outlet temperature vs. time plots, and volumetric efficiency vs. pressure ratio curves. Figure 5 illustrates the different datasets, machine learning models, and prediction graphs incorporated for the study of the reciprocating compressors, whereas Fig. 6 shows the complete machine learning workflow carried out for the study. The selected models, linear regression with polynomial features, decision tree regressor, random forest regressor, support vector regressor (SVR), and XGBoost regressor, were executed against three sets of datasets generated via MATLAB simulations, depicting a single-stage reciprocating compressor, a multistage reciprocating compressor without an intercooler, and a multistage reciprocating compressor with an intercooler. Linear regression with polynomial features was used to capture the nonlinear relationships inherent in the compressor performance curves, particularly in volumetric efficiency variations.
The decision tree and random forest regressors were employed because they can capture complex nonlinear patterns and interactions between multiple input variables without assuming a specific functional form. The SVR was incorporated because of its ability to deal with high-dimensional feature spaces and, thus, is highly capable of handling fine-grained variations in terms of thermodynamic parameters. Finally, the XGBoost regressor, a gradient-boosting classifier, was included because of its high prediction accuracy and high performance in dealing with structured numerical data. Applying these models to all three compressor datasets, this study attempted to compare their performance in terms of predicting the main performance indicators and identifying the highest-performing model for each dataset. This comparative study provides insight into the feasibility of several regression techniques in compressor modeling and adds value to data-driven optimization approach formulation for industrial compressor systems. In addition, from these predictions and comparisons, the study sought to verify the results acquired from the MATLAB simulations, and it was confirmed that the most efficient is the multistage reciprocating compressor with an intercooler. This finding aligns with the established thermodynamic principles, where intercooling reduces the temperature increase per stage, leading to reduced compression work and increased volumetric efficiency.
Results
The performance of the machine learning models for the prediction of critical compressor parameters was evaluated on three compressor configuration datasets: single-stage reciprocating compressor, multistage reciprocating compressor with no intercooler, and multistage reciprocating compressor with intercooling. The four datasets were used to train and validate the five regression algorithms—linear regression with polynomial features, decision tree regressor, random forest regressor, support vector regressor, and XGBoost regressor—to forecast the significant performance curves, that is, pressure–volume (PV) plots, outlet temperature vs. time graphs, and volumetric efficiency vs. pressure ratio plots. The machine learning model outputs and predicted values were compared with the actual values obtained through MATLAB-based simulation to verify the accuracy and efficiency of the models. All performance metrics and plots presented hereafter were generated exclusively using the unseen testing dataset to ensure an unbiased evaluation of each model’s generalization capability.
As shown in Table 2, mean absolute error (MAE), mean squared error (MSE), root-mean-squared error (RMSE), and R-squared (R2) score were used as evaluation factors to encompass the prediction capability of each model in a complete manner. Among the models tested, XGBoost and random forest regressor consistently generated higher predictive accuracy in all datasets and graphs, with the lowest error rates and highest R2 values. These ensemble models were successful in identifying the intricate nonlinear trends in the data and hence were best suited for compressor performance prediction. Conversely, the overfitting tendency was displayed by the decision tree regressor in some instances, whereas the support vector regressor demonstrated a relatively moderate performance, especially in the case of datasets with complex relationships.
Table 2. Performance evaluation of machine learning models on the testing dataset. Key error metrics are used to compare predictive accuracy
Compressor Type | Graph | ML model | MAE | MSE | RMSE | R2 |
|---|---|---|---|---|---|---|
Single stage compressor | P-V diagram | Linear regression | 6.1097x10-6 | 1.9937x10-10 | 1.412x10-5 | 0.9999 |
Decision tree regressor | 7.8921 | 200.3895 | 14.1559 | 0.9843 | ||
Random forest regressor | 5.8743 | 110.6956 | 10.5212 | 0.9930 | ||
Support vector regressor | 6.4505 | 141.3649 | 11.8897 | 0.9829 | ||
XGBoost regressor | 5.4560 | 90.8151 | 9.5297 | 0.9939 | ||
Outlet temperature VS Time | Linear regression | 7.3157x10-13 | 1.2095x10-24 | 1.0998x10-12 | 1 | |
Decision tree regressor | 2.6749 | 10.2009 | 3.1939 | 0.9883 | ||
Random forest regressor | 2.0803 | 7.5817 | 2.7535 | 0.9913 | ||
Support vector regressor | 2.3647 | 10.8959 | 3.3009 | 0.9875 | ||
XGBoost regressor | 2.0426 | 7.1588 | 2.6756 | 0.9918 | ||
Volumetric efficiency VS Pressure ratio | Linear regression | 3.0381x10-7 | 79.7252x10-14 | 8.9289x10-7 | 0.9999 | |
Decision tree regressor | 0.0016 | 0.00000529 | 0.0023 | 0.9998 | ||
Random forest regressor | 0.0009 | 0.00000196 | 0.0014 | 0.9999 | ||
Support vector regressor | 0.0118 | 0.00028 | 0.0168 | 0.9908 | ||
XGBoost regressor | 0.0025 | 0.00001369 | 0.0037 | 0.9996 | ||
Multistage compressor without intercooler | P-V diagram | Linear regression | 0.0178 | 0.0011 | 0.0344 | 0.9999 |
Decision tree regressor | 5.1191 | 92.9797 | 9.6426 | 0.9492 | ||
Random forest regressor | 3.2608 | 32.8661 | 5.7329 | 0.9770 | ||
Support vector regressor | 3.1527 | 39.1137 | 6.2541 | 0.9792 | ||
XGBoost regressor | 3.3809 | 39.7278 | 6.3030 | 0.9764 | ||
Outlet temperature VS Time | Linear regression | 2.8023x10-13 | 15.1632x10-26 | 3.8940x10-13 | 1 | |
Decision tree regressor | 9.2581 | 127.1323 | 11.2753 | 0.8344 | ||
Random forest regressor | 7.0392 | 70.3384 | 8.3868 | 0.9083 | ||
Support vector regressor | 2.5324 | 17.7569 | 4.2139 | 0.9768 | ||
XGBoost regressor | 5.3681 | 42.4517 | 6.5155 | 0.9447 | ||
Volumetric efficiency VS Pressure ratio | Linear regression | 0.0001 | 5.7121x10-8 | 0.000239 | 0.9999 | |
Decision tree regressor | 0.1337 | 0.0673 | 0.2596 | 0.9625 | ||
Random forest regressor | 0.0854 | 0.0256 | 0.1602 | 0.9865 | ||
Support vector regressor | 0.0956 | 0.0362 | 0.1905 | 0.9779 | ||
XGBoost regressor | 0.0714 | 0.0178 | 0.1335 | 0.9886 | ||
Multistage compressor with intercooler | P-V diagram | Linear regression | 2.8774x10-7 | 72.6551x10-14 | 8.5238x10-7 | 1 |
Decision tree regressor | 8.0130 | 800.0016 | 28.2843 | 0.9192 | ||
Random forest regressor | 10.681 | 377.3422 | 19.4253 | 0.9457 | ||
Support vector regressor | 7.8997 | 161.7017 | 12.7162 | 0.9859 | ||
XGBoost regressor | 7.6241 | 396.7307 | 19.9181 | 0.9304 | ||
Outlet temperature VS Time | Linear regression | 1.4791x10-6 | 8.195x10-12 | 2.8627x10-6 | 0.9999 | |
Decision tree regressor | 1.7407 | 7.023 | 2.6501 | 0.9957 | ||
Random forest regressor | 1.4397 | 7.3197 | 2.7055 | 0.9955 | ||
Support vector regressor | 0.1936 | 0.0485 | 0.2204 | 0.9999 | ||
XGBoost regressor | 1.6924 | 6.7756 | 2.603 | 0.9958 | ||
Volumetric Efficiency VS Pressure ratio | Linear regression | 5.381x10-7 | 2.5338x10-12 | 1.5918x10-6 | 0.9999 | |
Decision tree regressor | 0.0811 | 0.16003 | 0.40004 | 0.9837 | ||
Random forest regressor | 0.0914 | 0.06185 | 0.2487 | 0.9934 | ||
Support vector regressor | 0.1023 | 0.02103 | 0.14504 | 0.9929 | ||
XGBoost regressor | 0.2035 | 0.3594 | 0.5995 | 0.9636 |
Ironically, linear regression with polynomial features, in some instances, demonstrated even greater accuracy than XGBoost and random forest regressors, especially when modeling smooth and well-specified relationships in the datasets. However, although it is more precise, it cannot be said to be a general model for every case because it is likely to be prone to overfitting if the underlying relationships are extremely nonlinear or noisy data are involved. Polynomial-based regression models also pick up very small changes too well, and this tends to make them less applicable to new data. This renders them less accurate in forecasting complex thermodynamic reactions in compressor systems, particularly when subjected to variable conditions. To empirically substantiate the claim that certain models were prone to overfitting, a comparative analysis of model performance on both the training and testing datasets was conducted. A significant drop in a model’s predictive accuracy when moving from the data it was trained on to new, unseen data is a definitive indicator of overfitting.
As shown in Table 3, the decision tree model achieved a perfect R2 score of 1.000 on the training data in both multistage compressor scenarios. However, its performance on the unseen test data degraded significantly, with the R2 score dropping to 0.8344 for the temperature prediction and 0.9192 for the PV diagram prediction. This performance gap is a classic indicator of overfitting, where the model learns the statistical noise of the training set rather than the underlying physical relationships. This analysis validates the observation that simple, unconstrained models are ill-suited for this task and underscores the necessity of using more robust ensemble methods like random forest and XGBoost.
Table 3. Comparison of training and testing R2 scores for the decision tree model, providing quantitative evidence of overfitting
Compressor type | Prediction task | Model | Train R2 | Test R2 |
|---|---|---|---|---|
Multistage without intercooler | Outlet temperature vs. time | Decision tree | 1.000 | 0.8344 |
Multistage with intercooler | PV diagram | Decision tree | 1.000 | 0.9192 |
When the performance parameter graphs were plotted for all the reciprocating compressors considered for the study, Fig. 10 highlights the actual vs predicted PV diagram for the single-stage reciprocating compressor, which was plotted using the XGBoost regressor, while Fig. 11 shows the comprehensive prediction of the PV diagram for a multistage reciprocating compressor without an intercooler using the XGBoost regressor, while a similar graph was plotted for the multistage reciprocating compressor with the intercooler in Figs. 12 and 13 shows the outlet temperature vs. time graph for a single-stage compressor using the XGBoost regressor. All the prediction graphs of the XGBoost regressor are mentioned because they produce the most consistent and comprehensive predictions with the least errors. Figures 14 and 15 show the outlet temperature vs time graph for a multistage reciprocating compressor with and without an intercooler, respectively. In the same order, Figs. 16, 17, and 18 show the volumetric efficiency vs pressure ratio graph for all the compressors.
[See PDF for image]
Fig. 10
Actual VS Predicted PV diagram for single stage reciprocating compressor on the testing dataset
[See PDF for image]
Fig. 11
Actual VS Predicted PV diagram for Multi stage reciprocating compressor without intercooler on the testing dataset
[See PDF for image]
Fig. 12
Actual VS Predicted PV diagram for Multi stage reciprocating compressor with intercooler on the testing dataset
[See PDF for image]
Fig. 13
Outlet temperature VS Time for Single stage compressor on the testing dataset
[See PDF for image]
Fig. 14
Outlet temperature VS Time for Multi stage reciprocating compressor without intercooler on the testing dataset
[See PDF for image]
Fig. 15
Outlet temperature VS Time for Multi stage reciprocating compressor with intercooler on the testing dataset
[See PDF for image]
Fig. 16
Volumetric efficiency VS Pressure ratio for Single stage reciprocating compressor on the testing dataset
[See PDF for image]
Fig. 17
Volumetric efficiency VS Pressure for Multi stage reciprocating compressor without intercooler on the testing dataset
[See PDF for image]
Fig. 18
Volumetric efficiency VS Pressure ratio for Multi stage reciprocating compressor with intercooler on the testing dataset
The prediction graphs for Pressure-Volume (PV) Diagram are as follows:
The prediction graphs Outlet temperature VS Time are as follows:
Single stage compressor
Multi stage reciprocating compressor without intercooler
The prediction graphs Volumetric efficiency VS Pressure ratio are as follows.
Single stage reciprocating compressor
Multistage reciprocating compressor without intercooler
Multistage reciprocating compressor with intercooler
Conclusion
Verifying the mathematical models of multistage reciprocating compressors is important for providing them with correctness in actual operations. This study utilized machine learning (ML) models to authenticate MATLAB-based simulations of single stage, multistage without intercooler, and multistage with intercooler compressors according to pressure–volume (PV) relationships, outlet temperature variations, and volumetric efficiency trends.
MATLAB simulations were used to uncover major differences in the operation. The single-stage compressor increases the outlet pressure by 1.6 times, decreases the volume by 20%, and increases the outlet temperature by 30%. The multistage compressor without an intercooler provided a 3.3–3.6×pressure rise, 60% reduction in volume, and 35% rise in temperature. The intercooled multistage compressor further enhanced the efficiency by 12%, with a temperature rise restricted to 30% and with a 6×pressure rise and 80% volume reduction.
ML algorithms, such as XGBoost, random forest, support vector regressor, decision tree, and linear regression with polynomial features, produced ≥ 90% predictive accuracy, which was consistent with the MATLAB values. XGBoost and random forest were the most effective, capturing the nonlinear relationships well. Although linear regression with polynomial features is highly accurate in certain examples, it is less effective under intricate thermodynamic conditions because of overfitting.
This study proves the efficacy of ML-augmented validation for compressor performance analysis, validating the role of data-driven methodology in supplementing conventional simulations. These results provide useful guidance for improving the compressor design optimization and developing ML-based verification tools for thermodynamic systems.
Acknowledgements
I extend my appreciation to my friends and colleagues, who have been supportive throughout and provided a stimulating academic.
Authors’ contributions
Yashraj Sanap: Conceptualization, Methodology, Software Pritibala Ingle: Data curation, Writing- Original draft preparation. Pravin Hujare: Visualization, Investigation. Umesh Chavan: Supervision. Varad Atre: Software, Validation. Deepak Hujare: Writing Reviewing and Editing.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data availability
The data that support the findings of this study are available on request from the corresponding author.
Declarations
Competing interests
The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Abbreviations
Machine learning
Pressure–volume
Gaussian process regression
Computational fluid dynamics
Motor current signature analysis
Support vector machine
Long short-term memory
Remaining useful life
Support vector regressor
Mean absolute error
Mean squared error
Root-mean-squared error
Interquartile range
Pressure ratio
K-nearest neighbors
Principal component analysis
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Achouch, M; Dimitrova, M; Dhouib, R; Ibrahim, H; Adda, M; Sattarpanah Karganroudi, S; Ziane, K; Aminzadeh, A. Predictive maintenance and fault monitoring enabled by machine learning: experimental analysis of a TA-48 multistage centrifugal plant compressor. Appl Sci; 2023; 13, [DOI: https://dx.doi.org/10.3390/app13031790] 1790.
2. Aminzadeh, A; Sattarpanah Karganroudi, S; Majidi, S; Dabompre, C; Azaiez, K; Mitride, C; Sénéchal, E. A machine learning implementation to predictive maintenance and monitoring of industrial compressors. Sensors; 2025; 25, 1006. [DOI: https://dx.doi.org/10.3390/sensors25041006]
3. Haba U, Brethee K, Alabied S, Mondal D, Gu F, Ball A (2018) Model-based fault detection and diagnosis of a two-stage reciprocating compressor using motor current signature analysis. Proceedings of the 31st Conference on Condition Monitoring and Diagnostic Engineering Management
4. Hsieh, SH; Shih, YC; Hsieh, W-H; Lin, FY; Tsai, MJ. Performance analysis of screw compressors - numerical simulation and experimental verification. Proc Inst Mech Eng C J Mech Eng Sci; 2012; [DOI: https://dx.doi.org/10.1177/0954406211417961]
5. Hao, X; Zhang, Z; Chi, J; He, Y. Research on the methods of predicting compressor characteristic curve. Int J Energy Res; 2023; 2023, [DOI: https://dx.doi.org/10.1155/2023/8848649] 8848649.
6. Joly, M; Sarkar, S; Mehta, D. Machine learning enabled adaptive optimization of a transonic compressor rotor with precompression. J Turbomach; 2019; 141,
7. Kovacevic A, Rane S, Stosic N (2016) Computational fluid dynamics in rotary positive displacement machines. 16th International Symposium on Transport Phenomena and Dynamics of Rotating Machinery
8. Kumar, A; Patil, S; Kovacevic, A; Ponnusami, SA. Performance prediction and Bayesian optimization of screw compressors using Gaussian process regression. Eng Appl Artif Intell; 2024; 133, [DOI: https://dx.doi.org/10.1016/j.engappai.2024.108270] 108270.
9. Lv, Q; Yu, X; Ma, H; Ye, J; Wu, W; Wang, X. Applications of machine learning to reciprocating compressor fault diagnosis: a review. Processes; 2021; 9, 909. [DOI: https://dx.doi.org/10.3390/pr9060909]
10. Marx J, Gantner S, Städing J, Friedrichs J (2018) A machine learning-based approach of performance estimation for high-pressure compressor airfoils. Turbo Expo: Power for Land, Sea, and Air, 51029, V02DT46A004
11. Nambiar, A; Venkatesh, NS; Aravinth, S; Sugumaran, V; Ramteke, SM; Marian, M. Prediction of air compressor faults with feature fusion and machine learning. Knowl-Based Syst; 2024; 304, [DOI: https://dx.doi.org/10.1016/j.knosys.2024.112519] 112519.
12. Pakatchian, MR; Ziamolki, A; Alhuyi Nazari, M. Applications of machine learning approaches in aerodynamic aspects of axial flow compressors: a review. Front Energy Res; 2023; 11, 1135055. [DOI: https://dx.doi.org/10.3389/fenrg.2023.1135055]
13. Sjöstedt CJ (2004) Modelling of displacement compressors using MATLAB/Simulink. NordDesign 2004 – Product Design in Changing Environment, Tampere, Finland
14. Ying, Y; Xu, S; Li, J; Zhang, B. Compressor performance modelling method based on support vector machine nonlinear regression algorithm. R Soc Open Sci; 2020; 7, [DOI: https://dx.doi.org/10.1098/rsos.191596] 191596.
15. Zhong, L; Liu, R; Miao, X; Chen, Y; Li, S; Ji, H. Compressor performance prediction based on the interpolation method and support vector machine. Aerospace; 2023; 10, 558. [DOI: https://dx.doi.org/10.3390/aerospace10060558]
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.