Content area
Abstract
This study was conducted to enhance the efficiency of chemical process systems and address the limitations of conventional methods through hyperparameter optimization. Chemical processes are inherently continuous and nonlinear, making stable operation challenging. The efficiency of processes often varies significantly with the operator’s level of expertise, as most tasks rely on experience. To move beyond the constraints of traditional simulation approaches, a new machine learning-based simulation model was developed. This model utilizes a recurrent neural network (RNN) algorithm, which is ideal for analyzing time-series data from chemical process systems, presenting new possibilities for applications in systems with special chemical reactions or those that are continuous and complex. Hyperparameters were optimized using a grid search method, and optimal results were confirmed when the model was applied to an actual distillation process system. By proposing a methodology that utilizes machine learning for the optimization of chemical process systems, this research contributes to solving new problems that were previously unaddressed. Based on these results, the study demonstrates that a machine learning simulation model can be effectively applied to continuous chemical process systems. This application enables the derivation of unique hyperparameters tailored to the specificities of a limited control volume system.
Full text
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Recently, owing to the increase in computer calculation speed, complex systems and various fluid flows that were difficult to analyze in the past have been analyzed. Simulation analyses play a crucial role in enhancing system performance across various industrial fields. Theoretical simulation analysis methods based on existing mathematical modeling are commonly used for control [1–3], optimization [3, 4], and scheduling [5, 6], regardless of the target process or system type. Processes involving simple internal flows or chemical reactions can be operated more efficiently through basic one-dimensional simulation. In particular, the synergy with the increase in computer calculation speed has made it possible to analyze various systems and processes that could not be analyzed in the past, such as complex turbulent flows, combustion, chemical reactions, and multiphase flows. However, in the case of nonlinear, multidimensional, and complex systems, process optimization using basic simulation techniques has limitations owing to the requirement of numerous assumptions. Particularly in systems such as chemical processes, which are nonlinear and have diverse characteristics, reliance on operator experience introduces variability in process efficiency depending on the skill level of the operator. Therefore, this study focuses on addressing these challenges by developing an empirical simulation model based on artificial neural networks.
Research using artificial intelligence and machine learning algorithms is a core field of the 4th Industrial Revolution and has witnessed a growing application in various fields based on data science. Research on factory intelligence, which can learn from phenomena observed in big data collected from the target system and apply the acquired knowledge to process control and operation, is also increasing significantly [7].
As shown in Figure 1, a typical simulation program constructs a prediction model for the target system based on theoretical concepts, followed by the derivation of a response for the input data. Conventional simulation methods require a theoretical basis for the phenomena occurring in a system, rendering it unfeasible to analyze undocumented theoretical phenomena or complex systems. In contrast, machine learning accomplishes this by permitting a computer to learn on its own from the input data and generate responses based on that data [8–11]. To construct machine learning models, specific principles are extracted from the data of the target system. This implies that the systems can be analyzed irrespective of their size, shape, or reaction equation complexity. It is theoretically challenging to elucidate how experienced operators employ machine learning models. In addition, they possess the capability to acquire knowledge regarding process operations and additional variables that exert a substantial influence on the system but are unchangeable.
[figure(s) omitted; refer to PDF]
Nevertheless, the effectiveness of these machine learning models fluctuates considerably in response to hyperparameters. Smith et al. determined that the training time can be substantially reduced by increasing the batch size and decreasing the number of parameter updates required for model learning [12], as opposed to decreasing the learning rate. When constructing machine learning models, Smith proposed a method for selecting hyperparameters that significantly reduced the training time and improved the model performance. A technique for optimizing training verification and test loss functions against underfitting and overfitting was established, along with a method for adjusting the learning rate to accelerate subsequent learning [13]. Yu et al. introduced a technique to dynamically adjust the learning rate based on the gradient of the loss function to accelerate the sluggish convergence of the initial learning process [14]. The efficacy of this approach was experimentally demonstrated across four architectures and diverse datasets, producing a precise model that converged more rapidly with the same number of repetitions (learning). This confirms the feasibility of enhancing the models through hyperparameter adjustments [14]. Lederrey et al. investigated batch-size selection methods to enhance model development efficiency. They devised a novel algorithm called the hybrid adaptive moving average batch size (HAMABS), subjected it to rigorous experimental testing, and discovered that it effectively decreased the model optimization time by approximately 23 times. This finding confirmed the viability and effectiveness of the proposed algorithm [15]. According to a study by Tso et al., the reduction in bias value and variance error, as well as the ability to select the optimal model based on changes in hyperparameters (particularly k-fold), is contingent upon the complexity of the model. Model optimization was accomplished by analyzing the variations in the cross-validation hyperparameter (lambda value) [16]. Samiee et al. utilized forward neural network analysis to assess the precision of diverse datasets and discovered that variations in the quantity of concealed neurons caused disparities in training performance [17, 18].
While several studies have examined the relationship between hyperparameter characteristics and model performance, there is a lack of research examining the correlation between these characteristics and model performance in practical, commercial settings. This gap is primarily due to the complex nature of artificial intelligence models, which often exhibit ambiguity in real-world applications. Consequently, in this study, we demonstrate the method to utilize machine learning techniques to optimize hyperparameters for a commercial chemical process, particularly within distillation columns, which are challenging to operate continuously due to the nonlinearity and uncertainty of the variables involved. By harnessing the power of empirical machine learning models, this study aims to develop a method that simplifies the complexity of distillation processes. This approach allows for a more detailed examination of the extensive and intricate operating conditions characteristic of commercial distillation systems, providing insights into critical processes for optimization and control. Our method emphasizes the importance of optimizing hyperparameters that directly influence the critical control points in distillation columns. Furthermore, we introduce a machine learning-based empirical model to navigate the intricate interplay of variables in a distillation column. This model allows for an enhanced understanding of the distillation process, crucial for maintaining continuous operation and high purity levels in the output products. Therefore, our study contributes a novel application of machine learning to optimize and control the distillation process in chemical process optimization.
2. Machine Learning Methodology
2.1. Data Mining
A critical step in the development of machine learning models is data mining, which involves locating and analyzing specific principles within a massive database. Implementing the most suitable approach requires a comprehensive comprehension of the intended procedure and data analysis. Through normalization, data initially collected in various units are transformed into machine learning-compatible data [9]. Furthermore, to ensure the accuracy and dependability of the measurements, data preprocessing may be necessary, and any values that fall outside the expected range of the target process are eliminated as required [19].
2.2. Process Data Collection
The development of data collection technology has resulted in an era of big data, where the effectiveness of machine learning models has become evident in the presence of an abundant volume of data. In the development of machine learning models, data collection is the initial phase. The most commonly used techniques are variable extraction, model selection, and application, which are applied to the development model according to the attributes of the data gathered during the target process. In addition, it is crucial to gather data that accurately represent the process, specifically those in which the operator’s extensive experience with the target process has been accumulated. Furthermore, in cases where the extraction of characteristic variables is not feasible, supplementary data acquisition is necessary to conduct optimization analysis [8–11].
2.3. Extracting Characteristic Variables
The performance of machine learning models is substantially influenced by both the amount and quality of data utilized. Precise data for the improvement of machine learning should exclusively include precise information and a suitable quantity. This data transformation is referred to as “dimensionality reduction,” and it is one of the most fundamental preprocessing steps in machine learning. Dimensionality reduction encompasses more than mere data compression or noise elimination and entails the derivation of a latent space that optimally represents the gathered data [20, 21]. Two methods exist for reducing the dimensions of data: feature selection and feature extraction.
To generate a concise feature set, feature selection aims to select a subset of all the available features. Insignificant data with no significant effect on the intended outcome prediction can be omitted from the complete feature set. The dimensionality of the data can be reduced by removing variables. Feature selection is the process by which a subset of features pertinent to model construction is chosen.
Feature extraction is a technique that is used to generate new attributes by combining existing features or transforming high-dimensional source data into low-dimensional data. Principal component analysis (PCA) is the simplest and most well-known method for this purpose. By generating a function based on the correlation of the initial data, dimensionality reduction is achieved by constructing a novel feature comprising a linear combination of existing features. Well-known feature selection methods include Lasso [22], information gain [23], relief [24], MRMR [25], Fisher score [26], Laplacian score [27], and SPEC [28]. Our study advances this by considering a set of variables informed by domain expertise and analyzing their interdependence in the context of a commercial distillation process, thereby enhancing the predictive accuracy and relevance of the model to real-world operations.
2.4. Data Preprocessing
Prior to model training, data preprocessing was performed to generate new data using the preprocessed input values. The learning of the model might be significantly impacted by a large absolute value difference between the input data; therefore, every element of the data was analyzed after preprocessing. The methods for preprocessing data can be broadly categorized into either feature normalization or mean-centering methods. Mean centering can facilitate learning by shifting the data such that the mean is in the center, thereby reducing certain types of bias effects; however, it is highly susceptible to the influence of extreme values and outliers [19]. The feature normalization method was obtained by dividing each feature value by its standard deviation. This method enhances the model even when the training data vary by multiple orders of magnitude [29] (equation (1)). As a result, it is used for specific processes in which the connection between data collections is substantial and thus expanded [30].
The selection of the learning algorithm to be used in the actual process was determined by preprocessing the data for normalization. At this stage, the developer was required to make an intuitive decision based on the relationship between the model and the data and select a model that is suitable for each characteristic of the process. While analyzing the time series data, which are characterized by the collection of information over a period and the fluctuation of process input conditions, supervised learning regression analysis was employed [31].
2.5. Algorithm Selection
This study proposes a prediction method for time-series data that utilizes a developed model to train an artificial neural network through learning preceding data. Investigations of discrete sequence data (e.g., time-series data or text sentences) utilize recurrent neural networks (RNNs) based on the assumption that the data are stationarity dependent. In addition, when modeling with a basic recurrent neural network, the vanishing gradient problem may reduce the model accuracy if the data sequence of the target system is long. Therefore, further examinations were performed using long short-term memory (LSTM). LSTM is an algorithm designed to address the vanishing problem by constructing a cell comprising a memory block, which comprises three gates—input, output, and forget gates—and a concealed layer. In conclusion, the gated recurrent units (GRUs) algorithm, which was recently developed, adopts a structure similar to that of the LSTM algorithm. In this algorithm, a model is constructed by utilizing the reset and update gates to select the data. Consequently, it effectively addresses the gradient loss issue encountered in the RNN algorithm while concurrently offering the advantage of workload reduction in comparison to LSTM via weight reduction. The model development in this investigation was carried out utilizing the three aforementioned algorithms (RNN, LSTM, and GRU) in addition to the deep learning libraries TensorFlow and Keras, both of which are Python based. The training and test data were generated using genuine process data collected over the course of one week (saved once every 30 s). The Adam optimizer was utilized for optimization [30, 31], and sigmoid and tanh served as the activation functions. This approach demonstrates that the RNN, LSTM, and GRU algorithms are aptly suited for developing models within machine learning-based distillation systems, providing new insights into their effective application.
2.6. Hyperparameter Tuning
A single class of machine learning parameters consists of weights and other variables that change as the model learns. The second category includes parameters that define the design of the machine learning model and are involved in the regularization or learning speed. They were categorized according to the revised hyperparameters (Table 1).
Table 1
Hyperparameter characteristics.
| Type | Description | Considerations |
| Learning rate | Learning weight parameters for models increases accuracy | Greatly affects learning speed |
| Regularization | Variables for solving the overfitting problem | Regularization according to data characteristics |
| Epoch | Number of repetitive learnings of all data once | Achieve learning efficiency and model generalization |
| Batch size | The size of learning by dividing all data | Consideration of available memory size and epoch performance |
| Hidden unit | Learning optimization determinants | Changes in proportion to the hidden layer size |
| Weight initialization | Achieve change according to the weight characteristic variable | Apply the same initial weight value |
Illustrative techniques for optimizing hyperparameters include focused grid, manual, grid, random, and Bayesian search [32]. Manual search is an approach in which the researcher exercises discretion in locating hyperparameters. Random search determines the optimal hyperparameters by producing random numbers within a specified range [33]. Grid search finds the optimal number of cases [34–37] by predetermining the range and interval of hyperparameters and substituting them into the number of cases.
Bayesian search sets up a separate surrogate model, updates the parameters of the model, and searches for hyperparameters. Therefore, the surrogate model also contains hyperparameters [38]. Among these, hyperparameter optimization was performed using the basic grid search method to reduce complexity. The model development conditions for constructing the final recurrent neural network machine learning model in this study were achieved through the following steps.
The Bayesian search creates an independent surrogate model, modifies its parameters, and conducts a hyperparameter search. Consequently, the surrogate model possesses hyperparameters as well [38]. To reduce complications, hyperparameter optimization was performed using the fundamental grid search method. Subsequent procedures were executed to satisfy the model development requirements to construct the ultimate RNN machine learning model for this investigation.
2.7. Batch Size and Epochs
The effectiveness of machine learning models depends on the type and characteristics of the data obtained. However, the quantity of data that can be utilized and stored is limited. Hence, the maximal batch size is constrained by the quantity of memory accessible in a particular hardware architecture, which imposes a restriction on the batch size. If the batch size is insufficient, the additional fixed computational burden associated with data structure management increases during model implementation. However, this burden is computationally inefficient. Furthermore, the accuracy of the gradient calculation did not substantially improve when the collection size exceeded a particular limit [39]. In addition, powers of two are frequently employed as minibatch sizes because the majority of hardware architectures operate most efficiently with batches of that magnitude. The most prevalent powers of two values include 32, 64, 128, and 256 [40]. As the determination of the maximum batch size range is contingent on the quantity of data, the analysis was performed within the interval encompassing the minimum batch size of one and the maximum batch size equal to the total amount of data.
2.8. Hidden Layers
The effectiveness of machine learning models depends on the type and characteristics of the data obtained. However, the quantity of data that can be utilized and stored is limited. Hence, the maximal batch size is restricted by the quantity of memory accessible in a particular hardware architecture, thereby imposing a restriction on the batch size. The fixed additional computational load associated with data structure management increases during model implementation if the batch size is insufficient. However, this burden is computationally inefficient. Furthermore, the accuracy of the gradient calculation did not substantially improve when the collection size exceeded a particular limit [39]. Furthermore, powers of two are frequently employed as minibatch sizes because the majority of hardware architectures operates most efficiently with batches of that magnitude. The most prevalent powers of two values include 32, 64, 128, and 256 [40]. As the determination of the maximum batch size range depends on the quantity of data, the analysis was performed within an interval comprising a minimum batch size of one and a maximum batch size equal to the total amount of data.
2.9. Optimizer
An optimization function is used to optimize the weight and bias; various methods exist for this purpose. The optimized function was formulated by adding a probability-based analysis technique, learning rate control method, and concept of inertia, as stated in [31]. Deep learning typically requires a substantial amount of computational resources owing to the large number of weights that must be learned. Consequently, the learning rate can be accelerated by employing random gradient descent to arbitrarily extract samples from a subset of the data. An excessively high learning rate, representing one of the hyperparameters, may result in an accelerated learning progression, although potentially past the optimization point. The issue of underfitting arises when the learning time increases owing to an insufficient learning rate. To enhance this, the RMSPorp optimization function was employed to optimize the learning rate in the direction of the optimal solution. Furthermore, linear regression utilizes a loss function graph in which a solution that minimizes the loss can be derived by identifying a single point at which the slope ends. However, because of the multiple dimensions and complexity of machine learning, the occurrence of such points may be random. Consequently, the local minimum values, as opposed to the minimum values, may be captured in the overall loss function graph. This problem can be addressed with the assistance of a momentum optimization function. By incorporating the notion of inertia into the gradient descent, momentum prevents the entrapment of local minima and permits learning to proceed. In conclusion, the Adam optimization function integrates the strengths of RMSProp and momentum. It is distinguished by its ability to dynamically modify the learning rate in response to the gradient and to resist becoming readily entangled in local minima owing to its inertia [34]. Consequently, the Adam optimization function was utilized in this investigation, and learning was executed at an initial learning rate of 0.001.
2.10. Model Architecture
The initialization of a neural network is associated with issues regarding the stability of the model. If the initial input value is altered, even after repetitive learning with the same data, different results may be obtained. To ensure that a constant initial value was always input, the model was constructed with the input seed fixed at a constant value using the Seed() function. This resulted in the reproducibility of the iterative learning model. Nevertheless, despite consistently inputting the initial value of the weight, the learning rate fluctuates owing to the internal algorithm of the optimization function and escalation in the number of repetition calculations; consequently, repeated learning does not yield entirely identical outcomes.
2.11. Activation Function
The activation function outputs the input data to the subsequent node, which is a crucial component in the design of machine learning models. The majority of the differentiation exists between activation functions that produce unipolar outputs (0-1) and bipolar outputs (−1–1). Equation (2) [41] defines the sigmoid function as the fundamental unipolar activation function; further analysis techniques comprise refined Relu, Leakly Relu, and Prelu [42–44].
A representative example of a bipolar activation function is the hyperbolic tangent (tanh function), which can be obtained by modifying the sigmoid function (equation (3)) into a hyperbolic function.
This study analyzed RNN, LSTM, and GRU algorithms composed of two activation functions [45–47].
2.12. Loss Function
Model optimization was performed using the loss function, which calculates the difference between the predicted and actual values. The loss function used in the validation model to assess the performance of the neural network guided the learning of the model in the correct direction. To regulate the output value of the constructed model, a score is computed from the variance between the predicted and actual values. In machine learning, loss functions such as the mean squared error (MSE), mean absolute error (MAE), binary cross-entropy, and categorical cross-entropy are utilized. The cross-entropy can be categorical or sparse [48]. Because weight adjustments can be made via these loss values and model characteristics vary substantially, a variety of application strategies are required based on the intended process. This study utilized MSE, a metric commonly employed in regression illustrations. Equation (4) indicates that as the error increases, the derived value approaches the MSE; conversely, as the error decreases, a smaller value is derived.
2.13. Model Evaluation
The correlation between the real driving data and simulation outcomes was validated using the coefficient of determination (equation (5)). Furthermore, the root mean square deviation (equation (6)) was utilized to assess the accuracy and precision of the model, enabling the optimal evaluation of the system [49, 50].
The Materials and Methods section presents a literature review on various aspects of data mining, including dimensionality reduction and feature extraction for the gathered data. This section introduces different categories of time-series algorithms, hyperparameter optimization techniques, and machine learning models. Furthermore, the section discusses methods for evaluating the models, including the activation, optimization, and loss functions that comprise the model. The actual chemical process is described in the Development of ML Model section, and the outcomes of the hyperparameters are examined after the construction of a machine learning model based on the attributes of the gathered data.
3. Materials and Methods
The process of optimizing the hyperparameters of the machine learning model presented in this study was performed in the order illustrated in Figure 2, which is outlined as follows:
(a) Dimensionality reduction-based data refining stage for selecting the gathered data for the target procedure
(b) Selection phase for time series algorithms (RNN, LSTM, and GRU)
(c) Steps for optimizing the batch size in accordance with the data size
(d) Model hyperparameter tuning (hidden layer and epoch) via grid search in accordance with the sample size and algorithm selected
(e) Hyperparameter selection stage for the optimal target process by transferring the precision and accuracy of the development model
[figure(s) omitted; refer to PDF]
4. Development of the ML Model
4.1. Description of the Target Process
The development model in this study focused on a 78-stage distillation process system, which is the most fundamental facility in the petrochemical industry. Its purpose is to separate the mixtures and generate products with a specified purity of 99% (w.t.) or higher (Figure 3). The distillation process is pivotal in the petrochemical industry because it efficiently separates hydrocarbon mixtures into components of desired purity. Stage #35 of the column is injected with the feedstock contained in the mixed butane storage tank at a pressure of 7.7 kg/cm2·g via a pump. By flashing the feedstock introduced into the column and allowing it to come into contact with the HC vapor component produced by the reboiler, a device that heats the mixture at the bottom of the column to generate vapors, the light component ascends to the uppermost section of the tower, whereas the heavy component descends to the lowermost section. The light HC component that ascends to the summit of the tower is retained in the reflux cylinder after passing through the condenser. The reflux cylinder, which condenses vapor at the top of the tower, is crucial in enhancing the purity of the distillation products by the recycling part of the condensed vapor back into the process. While a fraction of these are transferred to the storage reservoir, the rest undergo reflux to the summit of the tower. Furthermore, the residual components descend towards the lowermost section of the tower, where the product is generated via the byproduct of stage #64, which is the most abundant source of n-C4. Finally, using a reboiler, the heavy HC component coagulated at the base of the tower is separated and discharged. This separation is essential for ensuring the efficiency and effectiveness of the distillation process. The variability of the raw materials used in the process (raw material 1, 60–80% (w.t.); raw material 2, 96–98% (w.t.)) results in frequent changes in the operating conditions, which hinders the achievement of efficient operation. Therefore, we wish to develop and implement machine learning-based empirical models for secure driving. We constructed a machine learning prediction model to account for the substantial variation in process efficiency that occurs during production due to fluctuations in the actual internal stage #64 temperature, which is a key indicator of process stability. This model leverages data on the flow rate of mixed butane, bottom pressure, and reflux flow rate to predict temperature changes accurately. These factors significantly influence the stage #64 temperature. Incorporating machine learning models offers a promising solution to adaptively manage these variations, thereby optimizing the distillation process for enhanced performance and reliability.
[figure(s) omitted; refer to PDF]
4.2. Process Data
Once every 30 s, the process data were gathered, and a model was constructed to predict the temperature of stage #64 using the input values of the feed flow rate, reflux flow rate, steam flow rate, and bottom pressure. These variables were selected based on previous studies, which showed that the reflux flow, reboiler steam, and bottom pressure are highly correlated with the temperature of stage #64 [31]. As outlined in Figure 3, the actual components entering the system in the datasets for July (a) and October (b) differ due to the use of varying feedstocks. The reason for this choice was to validate the robustness of the modeling results under different operational conditions. In conclusion, the analysis utilized 12,030 (July) training data points and 20,073 (October) test data points, as shown in Figure 4. The data for July were collected over 6 days, resulting in 12,030 data points, and the data for October were collected over 7 days, resulting in 20,073 data points, each correlating with the introduction of different feedstocks into the storage tank. This differentiation is crucial to ensure that the model is trained and validated on distinct, nonoverlapping scenarios to maintain the integrity of the test results [31]. The key to process control is adjusting the feedstock amount and steam supplied to maintain the temperature of stage #64 at an optimal 72°C, a critical parameter reflected in the data. The actual data collection occurred every 30 s, which is necessary for a system that requires checking and logging data at least every 30 s to address any anomalies that may arise within minute intervals. While outlier analysis and model stabilization are possible, they can lead to a model that may not respond adequately to unusual conditions. Hence, the model being applied to the current system is an initial development phase model aimed at addressing anomalies as they occur. Further research can analyze the appropriate number of data points and modeling to manage the computational load.
[figure(s) omitted; refer to PDF]
4.3. Model Validation and Verification
As shown in Figure 5, the data gathered for model development were categorized as follows: training, validation, and verification data. The actual augmentation values were adjusted using the training data, whereas the algorithm selection and parameter adjustment were influenced by the verification data. The accuracy of the developed model and the generalized conditions that ascertained its suitability were assessed using confirmation data. Representative data were used to analyze the verification data. Field applications are unachievable if the model’s precision is compromised as a result of overfitting or underfitting; therefore, the generalization performance can be enhanced. This is essential. It is feasible to mitigate overfitting by decreasing the number of parameters incorporated into the model. In addition, dropout, weight regularization, and network capacity reduction techniques can be employed to achieve greater precision. Furthermore, the number of calculations and dimensions of the model must be increased to prevent underfitting. Numerous analysis methods are available to prevent overfitting, including early termination, an ensemble of compromises between width and depth, and regularization. An analysis was performed to determine the most suitable method for the target process. Furthermore, the issue of gradient vanishing can be resolved by implementing batch normalization, the conjugate gradient method, and an adaptive learning rate [29].
[figure(s) omitted; refer to PDF]
4.4. Application Process Algorithm
As a result of the operator’s operations of determining the process operation plan in the time-series data analysis, supervised learning was selected as the learning method for the regression analysis model. The data feed flow, steam flow, reflux flow, and bottom pressure were utilized in the analysis, as determined in prior research [31]. Figure 6 demonstrates the application of three sequential models: RNN, LSTM, and GRU for processing the input data. The RNN model (Figure 6(a)) processes data through its recurrent structure but may suffer from the vanishing gradient problem as the sequence lengthens. To address this issue, the LSTM model (shown in Figure 6(b)) includes memory blocks with the input, output, and forget gates to preserve important information over long periods. Meanwhile, the GRU model (Figure 6(c)) simplifies the architecture by merging the gates and thus requires fewer parameters, reducing computational demand while still addressing the vanishing gradient problem. Here, the data are combined with the previous moment’s cell state and hidden unit, and subsequently trained using the input values from a set of four units. Finally, the transmitted data were consolidated into a single value via dimension reduction using a dense layer. Normalization was performed to eliminate errors introduced by the varying attribute scales of the data. The analysis was performed by normalizing the maximum and minimum values of the various target process ranges of 0-1, as listed in Table 2. Model learning was carried out in a Python 3.6 environment on an Intel(R) Xeon Silver 4110 processor operating at 2.10 GHz with 64 GB of memory.
[figure(s) omitted; refer to PDF]
Table 2
Maximum and minimum values for regularization.
| Type | Description | Min | Max |
| Flow rate (kg/h) | Mixed butane feedstock flow | 1,000 | 7,000 |
| Reboiler steam flow | 1,000 | 5,000 | |
| Reflux flow | 7,000 | 30,000 | |
| Pressure (kg/cm2) | Splitter bottom pressure | 7.2 | 8.2 |
| Temperature (°C) | Splitter #64 tray temperature | 69 | 74 |
5. Optimization Results for Hyperparameters
5.1. Algorithm and Epoch
To examine the attributes of the target process according to the machine learning model algorithm, learning was initiated with one hidden layer and a batch size of 256. To discern the actual instances of overfitting and underfitting for the model algorithms, we incrementally increased the number of epochs, considering 256 as the standard batch size to evaluate the characteristics. The training model depicted in Figure 7(a) leads to a substantial decline in the precision of the LSTM after 100 epochs. In contrast, neither the RNN nor GRU algorithms showed overfitting characteristics, which involved the loss of gradients with increasing epochs. This is thought to be a consequence of the simplified model configuration compared to the LSTM algorithm, which produces robust characteristics.
[figure(s) omitted; refer to PDF]
In addition, Figure 7(b) shows that a benchmark for acceptable
5.2. Batch Size
To optimize the model development for the specified procedure, algorithm and sample size analyses were conducted. The learning process was considered complete when the
As shown in Table 3, the duration of the model development increased with the number of iterations for all algorithms. Specifically, for the RNN algorithm to produce models with equivalent performance at
Furthermore, the generalization characteristic analysis revealed that the average
5.3. Analysis Result According to the Hyperparameter of Grid Search
The number of hidden units in machine learning models varies according to the algorithmic properties and structure of the model. The number of hidden units varies with respect to the number of hyperparameter hidden layers that can be selected during the analysis and the bias value derived from the algorithm characteristics. The basic RNN algorithm resulted in a threefold increase over the GRU algorithm and a fourfold increase over the LSTM algorithm. Although augmenting the number of hidden units may enhance the model’s accuracy and precision, it also reduces the calculation time and introduces the possibility of overfitting; thus, an optimal value becomes imperative. Table 3 lists the number of hidden units that varied when the standard algorithm and hidden layer were modified for the four input variables examined in this study. The grid search method was utilized to assess the impact of increasing the number of hidden layers and epochs on model performance.
Table 3
Model characteristic change according to the batch size and the algorithm (target
| Batch size | RNN (number of hidden units = 6) | LSTM (number of hidden units = 24) | GRU (number of hidden units = 18) | |||||||||||||||
| Epoch | Iteration | RMSE validation | RMSE verification | Epoch | Iteration | RMSE validation | RMSE verification | Epoch | Iteration | RMSE validation | RMSE verification | |||||||
| 1 | 2 | 24026 | 0.9312 | 0.7659 | 0.0976 | 0.2252 | 1 | 12013 | 0.9146 | 0.7901 | 0.2307 | 0.2562 | 1 | 12013 | 0.9434 | 0.7774 | 0.2149 | 0.2608 |
| 2 | 2 | 12013 | 0.9119 | 0.7293 | 0.0858 | 0.1988 | 2 | 12013 | 0.9162 | 0.8248 | 0.0900 | 0.1434 | 1 | 6007 | 0.9102 | 0.6964 | 0.1134 | 0.2071 |
| 4 | 4 | 12013 | 0.9192 | 0.7455 | 0.0993 | 0.1734 | 2 | 6007 | 0.9114 | 0.7632 | 0.1241 | 0.1815 | 2 | 6007 | 0.9189 | 0.7130 | 0.1431 | 0.1950 |
| 8 | 5 | 7508 | 0.9004 | 0.7079 | 0.0922 | 0.2021 | 4 | 6007 | 0.9117 | 0.7765 | 0.0871 | 0.1740 | 1 | 1502 | 0.9055 | 0.6271 | 0.1374 | 0.2285 |
| 16 | 10 | 7508 | 0.9069 | 0.7154 | 0.1154 | 0.1906 | 2 | 1502 | 0.9028 | 0.7068 | 0.0961 | 0.2003 | 1 | 751 | 0.9001 | 0.6113 | 0.2036 | 0.2855 |
| 32 | 17 | 6382 | 0.9027 | 0.7063 | 0.0951 | 0.1932 | 3 | 1126 | 0.9021 | 0.6970 | 0.1051 | 0.2236 | 3 | 1126 | 0.9154 | 0.6299 | 0.1588 | 0.2467 |
| 64 | 33 | 6194 | 0.9025 | 0.7077 | 0.1143 | 0.1932 | 5 | 939 | 0.9029 | 0.6937 | 0.1019 | 0.2278 | 5 | 939 | 0.9129 | 0.6248 | 0.1717 | 0.2586 |
| 128 | 59 | 5537 | 0.9005 | 0.7009 | 0.1045 | 0.1945 | 10 | 939 | 0.9003 | 0.6882 | 0.1000 | 0.2311 | 8 | 751 | 0.9005 | 0.6181 | 0.1689 | 0.2584 |
| 256 | 113 | 5303 | 0.9002 | 0.7009 | 0.0942 | 0.1923 | 18 | 845 | 0.9005 | 0.6987 | 0.0968 | 0.2297 | 17 | 798 | 0.9042 | 0.6201 | 0.1639 | 0.2538 |
| 512 | 213 | 4998 | 0.9002 | 0.7010 | 0.1046 | 0.1959 | 36 | 845 | 0.9019 | 0.6907 | 0.0919 | 0.2218 | 32 | 751 | 0.9028 | 0.6191 | 0.1777 | 0.2643 |
| 1024 | 414 | 4857 | 0.9001 | 0.7005 | 0.0912 | 0.1954 | 70 | 821 | 0.9009 | 0.6891 | 0.0958 | 0.2282 | 65 | 763 | 0.9000 | 0.6187 | 0.1584 | 0.2500 |
| 2048 | 813 | 4769 | 0.9002 | 0.7003 | 0.0998 | 0.1960 | 138 | 809 | 0.9005 | 0.6882 | 0.0938 | 0.2264 | 127 | 745 | 0.9005 | 0.6189 | 0.1585 | 0.2499 |
| 4096 | 1593 | 4672 | 0.9001 | 0.7006 | 0.0972 | 0.1964 | 275 | 807 | 0.9003 | 0.6880 | 0.0950 | 0.2280 | 253 | 742 | 0.9001 | 0.6188 | 0.1570 | 0.2488 |
| 8192 | 2366 | 3470 | 0.9003 | 0.7004 | 0.0957 | 0.1964 | 410 | 601 | 0.9000 | 0.6875 | 0.0949 | 0.2280 | 379 | 556 | 0.9001 | 0.6187 | 0.1581 | 0.2494 |
| 12013 | 4755 | 4755 | 0.9009 | 0.7026 | 0.0970 | 0.1960 | 820 | 820 | 0.9001 | 0.6876 | 0.0950 | 0.2283 | 757 | 757 | 0.9000 | 0.6188 | 0.1577 | 0.2492 |
| Average | 693.2 | 7600.3 | 0.9052 | 0.7123 | 0.0989 | 0.1959 | 119.7 | 3073.8 | 0.9044 | 0.7180 | 0.1065 | 0.2152 | 110.1 | 2028.3 | 0.9076 | 0.6421 | 0.1629 | 0.2471 |
In the initial training phase, particularly between epochs 1 and 10 as depicted in Figure 8, all algorithm models generally displayed accuracies below 0.5. This pattern indicates underfitting, suggesting that the models had not yet learned enough to adequately capture the complexities of the data. As epochs increased, a trend of rising accuracy followed by a decrease was observed, suggesting overfitting where the models began to lose their generalization ability and became extremely tailored to the training data.
[figure(s) omitted; refer to PDF]
At epoch 73, with seven hidden layers, the RNN validation model achieved its highest accuracy of 0.91, as evidenced by the grid search results in Figure 8(a). Furthermore, in Figure 8(b), the verification model achieved an accuracy of not less than 0.8 and exhibited the same trend as the validation model, demonstrating that the RNN model possesses rigid characteristics.
Actually, because of the long-term memory and forgetting within the internal algorithm, the LSTM validation model in Figure 8(c) achieved an accuracy of 0.92 or higher using the same seven hidden layers as the RNN algorithm, and a comparable level of model accuracy was observed across all hidden layers during epochs 20–50. Thus, the gradient loss effect through the gate is believed to have been reduced. In contrast to the RNN, the LSTM verification model achieved its peak performance in concealed layers 5–9, as evidenced by its accuracy shown in Figure 8(d). Overfitting was observed in all hidden layer conditions when the number of epochs reached or exceeded 100. Consequently, the optimal hyperparameters of the LSTM model were calculated to be five hidden layers and 50 epochs.
The validation model of the GRU in Figure 8(e) demonstrates the highest accuracy among the shallowest of the three algorithms, specifically in hidden layer 3. The effectiveness of the model is evident as there are significant fluctuations above five hidden layers, and at ten hidden layers, the accuracy falls below
The precision outcomes of the model used for the algorithm validation are shown in Figure 9(a), 9(c), and 9(e). They depict a trend comparable to that of the accuracy depicted in Figure 8, thus revealing the tradeoff relationship within the simulation model. Regions are observed where the highest accuracy (
[figure(s) omitted; refer to PDF]
5.4. Results of Optimal Conditions
As shown in Figure 10, the outcomes of a comparative analysis between the optimization conditions obtained via the grid search method (batch size: 256, hidden unit: 6, and epoch: 50) and the 64-stage temperature prediction data of the model developed using the LSTM algorithm, as well as the actual operation data, are presented. The validation model of the LSTM algorithm in Figure 10(a) exhibits an accuracy of 0.9040 and a precision of 0.1068 (refer to Figure 10(b)). The outcomes generated by the verification model were more appropriate for training and testing than those generated by the RNN and GRU algorithms (accuracy: 0.8094; precision: 0.1644). In addition, the generation models produced by the RNN and GRU algorithms exhibit a diminished level of accuracy. Nevertheless, they possess the benefit of faster calculation speed compared with the LSTM algorithm, as indicated in Table 4, owing to the reduced size of the hidden units. Thus, when rapid results are required, the model accuracy is low, and the RNN or GRU algorithms may be utilized as alternatives. LSTM was chosen as the optimal algorithm for the distillation process system under investigation in this research due to its ability to maintain a process efficiency of 2% or higher despite the 64 phases within the distillation tower operating at a temperature of 0.1°C.
[figure(s) omitted; refer to PDF]
Table 4
Machine learning model parameters according to algorithm and hidden unit.
| Algorithm | Number of hidden units | |||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| RNN | 6 | 14 | 24 | 36 | 56 | 66 | 84 | 104 | 126 | 150 |
| LSTM | 24 | 56 | 96 | 144 | 200 | 264 | 336 | 416 | 504 | 600 |
| GRU | 18 | 42 | 72 | 108 | 150 | 198 | 252 | 312 | 378 | 450 |
6. Conclusions
The effectiveness of machine learning models depends on the quality and quantity of the data provided. Hence, for optimal model learning, the data should exclusively comprise precise and pertinent information that is neither excess nor short. To ensure high data quality, we employ rigorous data cleaning and preprocessing techniques to remove outliers and normalize the data, which are critical for training reliable models. It is feasible to select and implement accurate information if it possesses comprehensive background knowledge of the target process. However, as this is not feasible, it is imperative to employ suitable data selection methods. Moreover, a machine learning model specifically designed for a restricted system acquires knowledge of every attribute associated with that system. Hence, although achieving faultless optimal process operation using the existing model is impossible, it is feasible to establish guidelines that enable adaptable responses to emergencies that may arise throughout the operational phase of the process.
The algorithms comprising these machine learning models may differ according to the data attributes, whereas the parameters function as dependent variables that are dynamically modified throughout the training process. In selecting the optimal algorithm, we considered the computational efficiency and ability of each algorithm to handle nonlinear relationships in data, leading to the choice of LSTM, GRU, and traditional RNNs based on their respective strengths and weaknesses. Nevertheless, hyperparameters are not the outcomes of data analysis; rather, they are independent variables that necessitate determination; thus, an absolute optimal value does not exist. Hence, it is imperative to employ an iterative learning modeling process to obtain the most effective hyperparameters. A hyperparameter tuning-based optimization method is introduced in this study, and the resulting model is constructed and validated using time-series data obtained from the operational distillation tower separation procedure. In conclusion, the hyperparameters and optimal algorithm (LSTM algorithm and Adam optimization function) were determined as follows: a 256 bit batch size, 50 epochs, and 6 hidden units.
The quantity of learning required to derive the hyperparameters varies according to the initial value settings, and the model characteristics change according to the training and verification test data when repeated learning is used to develop a machine learning model. This iterative process highlights the dynamic nature of machine learning, where ongoing adjustments and refinements are necessary to adapt to new data and evolving operational conditions. Furthermore, in addition to the aforementioned attributes, the proficiency of an operator in process operations and the dimensions and properties of the target process system can influence the determination of numerous hyperparameters. Consequently, for effective learning models, model development and administration techniques must diverge from those used in current theoretical simulations. Future research should focus on integrating more sophisticated adaptive learning techniques to improve model resilience and performance in unpredictable real-world environments. In addition, specific and supplementary optimization techniques, such as methods for improving models via algorithm refinement and approaches that involve modifying activation functions and model structures to alter characteristics that were not addressed in this study, should be pursued for implementation in real-world processes. Fundamental research was undertaken in this study with the intention of developing and applying an optimized machine learning model via hyperparameter optimization to real-world processes. The AI algorithm for artificial intelligence, which is regarded as a black box, possesses suitable hyperparameters within a control-volume-specified system. Its definability has been verified and can provide machine learning researchers with fundamental insights.
Authors’ Contributions
Junghwan Kim and DaeHyun Kim equally contributed to this study.
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) Grant Funded by the Korea Government (MSIT) (nos. 2021R1A6A1A0304424211 and 2022R1C1C2009821).
Glossary
Abbreviations
Adam:Adaptive moment estimation
ANN:Artificial neural network
DCS:Distributed control system
GD:Gradient descent
GRU:Gated recurrent unit
LSTM:Long short‐term memory
MSE:Mean squared error
R:Pearson correlation coefficient
RMSE:Root mean squared error
RNN:Recurrent neural network
SGD:Stochastic gradient descent.
[1] Y. Choi, B. Bhadriaju, H. Cho, J. Lim, I. S. Han, I. Moon, J. S. I. Kwon, J. Kim, "Data-driven modeling of multimode chemical process: validation with a real-world distillation column," Chemical Engineering Journal, vol. 457,DOI: 10.1016/j.cej.2022.141025, 2023.
[2] N. An, S. Hong, Y. Kim, H. Cho, J. Lim, I. Moon, J. Kim, "Dual attention-based multi-step ahead prediction enhancement for monitoring systems in industrial processes," Applied Soft Computing, vol. 147,DOI: 10.1016/j.asoc.2023.110763, 2023.
[3] H. Park, J. Roh, K. c. Oh, H. Cho, J. Kim, "Modeling and optimization of water mist system for effective air-cooled heat exchangers," International Journal of Heat and Mass Transfer, vol. 184,DOI: 10.1016/j.ijheatmasstransfer.2021.122297, 2022.
[4] J. Lim, H. Cho, H. Kwon, H. Park, J. Kim, "Reinforcement learning-based optimal operation of ash deposit removal system to improve recycling efficiency of biomass for CO2 reduction," Journal of Cleaner Production, vol. 370,DOI: 10.1016/j.jclepro.2022.133605, 2022.
[5] H. Park, H. Kwon, H. Cho, J. Kim, "A framework for energy optimization of distillation process using machine learning‐based predictive model," Energy Science and Engineering, vol. 10 no. 6, pp. 1913-1924, DOI: 10.1002/ese3.1134, 2022.
[6] C. Joo, H. Park, H. Kwon, J. Lim, E. Shin, H. Cho, J. Kim, "Machine learning approach to predict physical properties of polypropylene composites: application of MLR, DNN, and random forest to industrial data," Polymers, vol. 14 no. 17,DOI: 10.3390/polym14173500, 2022.
[7] X. Zhang, Y. Zou, S. Li, S. Xu, "A weighted autoregressive LSTM based approach for chemical processes modeling," Neurocomputing, vol. 367, pp. 64-74, DOI: 10.1016/j.neucom.2019.08.006, 2019.
[8] T. Hope, Y. S. Resheff, I. Lieder, Learning Tensorflow: A Guide to Building Deep Learning Systems, 2017.
[9] A. C. Müller, S. Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists, 2016.
[10] M. Ariga, S. Nakayama, D. Nishibayasi, Machine Learning at Work, 2018.
[11] S. Y. Kim, Y. J. Jung, First Learning Machine Learning, 2017.
[12] S. L. Smith, P. J. Kindermans, C. Ying, Q. V. Le, "Don’t decay the learning rate, increase the batch size," arXiv preprint arXiv:1711.00489,DOI: 10.48550/arXiv.1711.00489, 2017.
[13] L. N. Smith, "A disciplined approach to neural network hyperparameters: Part 1: learning rate, batch size, momentum, and weight decay," arXiv preprint arXiv:1803.09820,DOI: 10.48550/arXiv.1803.09820, 2018.
[14] C. Yu, X. Qi, H. Ma, X. He, C. Wang, Y. Zhao, "LLR : learning learning rates by LSTM for training neural networks," Neurocomputing, vol. 394, pp. 41-50, DOI: 10.1016/j.neucom.2020.01.106, 2020.
[15] G. Lederrey, V. Lurkin, T. Hillel, M. Bierlaire, "Estimation of discrete choice models with hybrid stochastic adaptive batch size algorithms," Journal of Choice Modelling, vol. 38,DOI: 10.1016/j.jocm.2020.100226, 2021.
[16] W. W. Tso, B. Burnak, E. N. Pistikopoulos, "HY-POP: hyperparameter optimization of machine learning models through parametric programming," Computers and Chemical Engineering, vol. 139,DOI: 10.1016/j.compchemeng.2020.106902, 2020.
[17] K. Samiee, A. Iosifidis, M. Gabbouj, "On the comparison of random and Hebbian weights for the training of single-hidden layer feedforward neural networks," Expert Systems with Applications, vol. 83, pp. 177-186, DOI: 10.1016/j.eswa.2017.04.025, 2017.
[18] K. G. Sheela, S. N. Deepa, "Review on methods to fix number of hidden neurons in neural networks," Mathematical Problems in Engineering, vol. 2013,DOI: 10.1155/2013/425740, 2013.
[19] B. Song, S. Tan, H. Shi, "Key principal components with recursive local outlier factor for multimode chemical process monitoring," Journal of Process Control, vol. 47, pp. 136-149, DOI: 10.1016/j.jprocont.2016.09.006, 2016.
[20] K. R. Gabriel, "The biplot graphic display of matrices with application to principal component analysis," Biometrika, vol. 58 no. 3, pp. 453-467, DOI: 10.1093/biomet/58.3.453, 1971.
[21] P. Bojanowski, A. Joulin, D. Lopez-Paz, A. Szlam, "Optimizing the latent space of generative networks," arXiv preprint arXiv:1707.05776,DOI: 10.48550/arXiv.1707.05776, 2017.
[22] R. Tibshirani, "Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society- Series B: Statistical Methodology, vol. 58 no. 1, pp. 267-288, DOI: 10.1111/j.2517-6161.1996.tb02080.x, 1996.
[23] S. Nowozin, "Improved information gain estimates for decision tree induction," ,DOI: 10.48550/arXiv.1206.4620, 2012.
[24] I. Kononenko, "Estimating attributes: analysis and extensions of RELIEF," European Conference on Machine Learning, pp. 171-182, 1994.
[25] H. Peng, F. Long, C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27 no. 8, pp. 1226-1238, DOI: 10.1109/TPAMI.2005.159, 2005.
[26] Q. Gu, Z. Li, J. Han, "Generalized Fisher score for feature selection," ,DOI: 10.48550/arXiv.1202.3725, 2012.
[27] X. He, D. Cai, P. Niyogi, "Laplacian score for feature selection," Advances in Neural Information Processing Systems, vol. 18, 2005.
[28] Z. Zhao, L. Wang, H. Liu, "Efficient spectral feature selection with minimum redundancy," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 24 no. 1, pp. 673-678, DOI: 10.1609/aaai.v24i1.7671, 2010.
[29] C. A. Charu, Neural Networks and Deep Learning: A Textbook, 2018.
[30] K. Oh, H. Kwon, J. Roh, Y. Choi, H. Park, H. Cho, J. Kim, "Development of machine learning-based platform for distillation column," Korean Chemical Engineering Research, vol. 58 no. 4, pp. 565-572, DOI: 10.9713/kcer.2020.58.4.565, 2020.
[31] H. Kwon, K. C. Oh, Y. Choi, Y. G. Chung, J. Kim, "Development and application of machine learning‐based prediction model for distillation column," International Journal of Intelligent Systems, vol. 36 no. 5, pp. 1970-1997, DOI: 10.1002/int.22368, 2021.
[32] L. Yao, Z. Fang, Y. Xiao, J. Hou, Z. Fu, "An intelligent fault diagnosis method for lithium battery systems based on grid search support vector machine," Energy, vol. 214,DOI: 10.1016/j.energy.2020.118866, 2021.
[33] J. Bergstra, Y. Bengio, "Random search for hyper-parameter optimization," Journal of Machine Learning Research, vol. 13 no. 2, 2012.
[34] M. A. Amirabadi, M. H. Kahaei, S. A. Nezamalhosseini, "Novel suboptimal approaches for hyperparameter tuning of deep neural network [under the shelf of optical communication," Physical Communication, vol. 41,DOI: 10.1016/j.phycom.2020.101057, 2020.
[35] F. J. Pontes, G. F. Amorim, P. P. Balestrassi, A. P. Paiva, J. R. Ferreira, "Design of experiments and focused grid search for neural network parameter optimization," Neurocomputing, vol. 186, pp. 22-34, DOI: 10.1016/j.neucom.2015.12.061, 2016.
[36] Á. Barbero Jiménez, J. López Lázaro, J. R. Dorronsoro, "Finding optimal model parameters by deterministic and annealed focused grid search," Neurocomputing, vol. 72 no. 13–15, pp. 2824-2832, DOI: 10.1016/j.neucom.2008.09.024, 2009.
[37] J. M. Dixon, H. Du, D. G. Cork, J. S. Lindsey, "An experiment planner for performing successive focused grid searches with an automated chemistry workstation," Chemometrics and Intelligent Laboratory Systems, vol. 62 no. 2, pp. 115-128, DOI: 10.1016/S0169-7439(02)00009-6, 2002.
[38] K. N. Pai, V. Prasad, A. Rajendran, "Experimentally validated machine learning frameworks for accelerated prediction of cyclic steady state and optimization of pressure swing adsorption processes," Separation and Purification Technology, vol. 241,DOI: 10.1016/j.seppur.2020.116651, 2020.
[39] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang, "On large-batch training for deep learning: generalization gap and sharp minima," ,DOI: 10.48550/arXiv.1609.04836, 2016.
[40] I. Kandel, M. Castelli, "The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset," ICT express, vol. 6 no. 4, pp. 312-315, DOI: 10.1016/j.icte.2020.04.010, 2020.
[41] J. Han, C. Moraga, "The influence of the sigmoid function parameters on the speed of backpropagation learning," International Workshop on Artificial Neural Networks, pp. 195-201, 1995.
[42] S. Mastromichalakis, "ALReLU: a different approach on Leaky ReLU activation function to improve neural networks performance," ,DOI: 10.48550/arXiv.2012.07564, 2020.
[43] B. Xu, N. Wang, T. Chen, M. Li, "Empirical evaluation of rectified activations in convolutional network," ,DOI: 10.48550/arXiv.1505.00853, 2015.
[44] M. A. Mercioni, S. Holban, "The most used activation functions: classic versus current," 2020 International Conference on Development and Application Systems (DAS), pp. 141-145, DOI: 10.1109/DAS49615.2020.9108942, 2020.
[45] J. Wang, J. Yan, C. Li, R. X. Gao, R. Zhao, "Deep heterogeneous GRU model for predictive analytics in smart manufacturing: application to tool-wear prediction," Computers in Industry, vol. 111,DOI: 10.1016/j.compind.2019.06.001, 2019.
[46] X. Wei, L. Zhang, H. Q. Yang, L. Zhang, Y. P. Yao, "Machine learning for pore-water pressure time-series prediction: application of recurrent neural networks," Geoscience Frontiers, vol. 12 no. 1, pp. 453-467, DOI: 10.1016/j.gsf.2020.04.011, 2021.
[47] S. Hochreiter, J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9 no. 8, pp. 1735-1780, DOI: 10.1162/neco.1997.9.8.1735, 1997.
[48] A. K. Sharma, G. Aggarwal, S. Bhardwaj, P. Chakrabarti, T. Chakrabarti, J. H. Abawajy, S. Bhattacharyya, R. Mishra, A. Das, H. Mahdin, "Classification of Indian classical music with time-series matching deep learning approach," IEEE Access, vol. 9, pp. 102041-102052, DOI: 10.1109/ACCESS.2021.3093911, 2021.
[49] K. Oh, J. Kim, S. Park, S. Kim, L. Cho, C. Lee, J. Roh, D. Kim, "Development and validation of torrefaction optimization model applied element content prediction of biomass," Energy, vol. 214,DOI: 10.1016/j.energy.2020.119027, 2021.
[50] K. Oh, S. Park, S. Kim, Y. Choi, C. Lee, L. Cho, D. Kim, "Development and validation of mass reduction model to optimize torrefaction for agricultural byproduct biomass," Renewable Energy, vol. 139, pp. 988-999, DOI: 10.1016/j.renene.2019.02.106, 2019.
Copyright © 2024 Kwang Cheol Oh et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/