1. Introduction 1.1. Thermochemical Energy Storage
Energy storage systems have become increasingly important in the shift towards renewable energy because of the fluctuation inherent to renewable energy generation [1,2]. Thermochemical Energy Storage (TCES) stores and releases energy in the form of heat as chemical potential of a storage material through a reversible endothermic/exothermic chemical reaction. TCES is favourable compared to sensible and latent heat storage [3,4,5] because it features a high energy density, low heat losses and the possibility to discharge the system at a relatively high and constant output temperature [6].
Numerous studies have been conducted on thermochemical energy storage in different materials, including calcium oxide [3,6], manganese oxide [7,8], barium oxide [9], zinc oxide [10], cobalt oxide/iron oxide [11], strontium bromide [12], calcium carbonate [13] and many more. In general, the chemical processes occurring on the storage material can be classified into: redox of metal oxides, carbonation/decarbonation of carbonates and hydration/dehydration of hydroxides [14]. The choice of materials depends on many criteria, one of which is the application of the energy storage. For example, in integration with Concentrated Solar Power (CSP) plants, manganese oxide is not suitable because of its high reaction temperature [6,15]. Another important aspect to consider is the practicability of the process; for example, in a calcium carbonate system, CO2 as the side effect of the reaction has to be liquefied and results in a high parasitic loss [6,14]. Additionally, there are many more criteria to consider, such as cyclability, reaction kinetics, energy density and, most importantly, safety issues. For comprehensive reviews of varying storage materials, we refer to [14,15,16].
Recently, experimental investigations have been conducted specifically for the calcium oxide (CaO)/calcium hydroxide (Ca(OH)2) system. One experiment investigated the material parameters (such as heat capacity and density) and the reaction kinetics [17], another experiment focused on studying the operating range, efficiency and the cycling stability of the system [18], and there was also an experiment on the feasibility of integration with concentrated solar power plants [19]. All these experiments show that CaO/Ca(OH)2 is a very promising candidate as TCES storage material. Furthermore, it is more attractive compared to other storage materials because it is nontoxic, relatively cheap and widely available [20,21]. The system stores the heat (is charged) during the dehydration of Ca(OH)2 by injecting dry air with higher temperature. Charging results in an endothermic reaction along with the formation of H2O vapour and lower temperature at the outlet. It releases the heat (is discharged) during the hydration of CaO. This is achieved by injecting air with higher humidity (H2O content) and relatively lower temperature, resulting in an exothermic reaction (see Figure 1). Note that in this case, the hydration process occurs at lower temperature relative to the dehydration process, but both processes occur at high operating temperature [22]. The reversible reaction is written as:
Ca(OH)2+ΔH⇌CaO+H2O.
A robust operational control of this system needs an accurate and real-time capable model to predict its state of charge and health. Similar models are operationally used, namely for batteries in mobile devices [23,24]. Accordingly, numerical TCES modelling studies were conducted to predict the system’s behaviour [6,18,20,21,25]. However, the PDEs that describe the system are dynamic, highly nonlinear and strongly coupled, making the numerical simulation computationally expensive. This poses a significant hindrance on a more thorough and complex analysis of the TCES system. Estimation of the system’s state of health, for example, requires a 2D or 3D model to study the effect of the structural change due to agglomeration [26]. With increasing spatial dimension, the computational time also increases strongly. Consequently, the system is not ready yet for commercial and industrial use until a faster and accurate model is developed. In this work, we consider using Artificial Neural Network (ANN) as a cheaper alternative to the expensive existing models.
1.2. Physics-Inspired Artificial Neural Networks
Artificial Neural Networks (ANNs) have been studied and applied intensively in the past few decades. They have become very popular alongside linear regression and other techniques such as Gaussian Process Regression (GPR) and Support Vector Machine (SVM) [27]. ANNs have advantages in terms of their flexibility and better applicability to model nonlinear problems compared to linear regression and GPR [28]. Additionally, it has better scalability to larger data compared to SVM [29]. However, a detailed performance comparison of ANN with other machine learning techniques is out of the scope of this paper.
ANNs have a wide range of applications, such as image and pattern recognition, language processing, regression problems and data-driven modelling [30]. In this paper, we focus on data-driven modelling, where an ANN is trained to predict the physical behaviour of a TCES system based on available data. ANNs have been used for data-driven modelling in different fields. In hydrology [31], ANNs have been successfully applied, for example to predict rainfall-runoff [32,33], groundwater levels [34] and groundwater contamination [35]. Moreover, ANNs have been used in energy system applications [36], for example to predict the performance of [37], reliability of [38] and design [39] renewable energy systems. All these examples show that ANNs have a potential to be a quick decision making tool which is useful for many engineering and management purposes.
In previous applications of ANNs in data-driven modelling, the ANN was treated as a black box [40,41] that learns only the mathematical relationship between the input and output. In such a process, the physical relationships and scientific findings that were previously used to build governing equations of the modelled systems are completely neglected. This issue is very troublesome and needs to be addressed because real data are noisy with measurement errors, and fitting the ANN to the noisy data without any physical constraint might lead to overfitting problems [30]. Additionally, in many cases, observation data is difficult and expensive to obtain, providing users with only a limited amount of data to train the ANN. Without any physical knowledge, ANNs perform poorly when trained with a low amount of data [42,43]. Furthermore, ANNs have a very poor interpretability [44,45], meaning that there might be different combinations of ANN elements (width, depth, activation functions and parameters) that fit the training data with similar likelihood, but not all of them are physically meaningful and robust. As a result, ANN predictions might be misleading.
Implementing physical knowledge to build the ANN structure and regularisation is a potential solution to solve this issue. By combining a black box model with a white box (fully physical) model, we obtain a grey box model. In such an approach, physical theories are used in combination with observed data to improve the model prediction and plausibility [46]. Moreover, the data will help to include complex physical processes that may not be captured in currently existing white box models. There are at least two motivations to do so: to obtain a reliable surrogate model for the physics-based model for the sake of speed in real-time environments and to address situations where the underlying physics of the system are incompletely understood, so that ANNs can build on, and later exceed, the current state of physical understanding.
Several works have been conducted to develop the so-called Physics Inspired Neural Networks (PINN). In general, PINNs can be grouped into two distinguishable motivations as mentioned above. The first one applies ANNs to infer the parameter values in the governing Ordinary Differential Equations (ODEs) or Partial Differential Equations (PDEs) as well as the constitutive relationships and the differential operators [43,47], assuming that the ODEs or PDEs perfectly describe the modelled system. The second one treats the system as a complex unit that is not sufficiently represented only with simplified equations. It trains the ANN based on observation data while constraining the ANN using physically-based regularisation [42,48,49]. We aim mostly at the second motivation and ask ourselves how much of the useful knowledge contained in the PDEs can be used to inform ANNs before proceeding to train them on observation data.
Despite the success of PINN implementations in this direction, there are still some open issues that need to be addressed: (I) There is no well-defined alignment between the structure of the governing equations with the structure of the ANN. For example, most, if not all of the applications are for dynamic systems. Nevertheless, the structures of the ANNs applied do not resemble the dynamic behaviours of the systems and do not consider recurrency. (II) The focus in PINN development is more on getting high accuracy with limited amount of training data rather than improving the physical plausibility of the predictions. (III) The implementations of PINNs in previous works are mainly for relatively simple problems, and implementation to more complex problems (featuring multiple nonlinear coupled equations) has not been evident yet. In our current study, we address these three open issues. 1.3. Approach and Contributions For dynamic and complex systems with coupled nonlinear processes such as the TCES system, we need an advanced approach to solve it using ANNs. Our approach implements physical knowledge of the system into building the ANN such as: (I) we use a Nonlinear Autoregressive Network with Exogeneous Inputs (NARX) structure. This is a form of Recurrent Neural Network (RNN), and we use deeper recurrence to account for the system’s long-term time scales and nonlinear dynamics; (II) we train the network with recurrence structure to improve the long-term predictions in the dynamic system; and (III) we add physical regularisation terms in the objective function of training to enforce physical plausibility of the predictions.
NARX is suitable to model time series of sequential (time-dependent) observationsy(t) [50,51], which are equispaced time series. There are several reasons why NARX is preferable to alternative ANN structures: the included feedback loop in NARX enables it to capture long-term dependencies [34] and the possibility to provide exogenous inputs improves the results compared to networks without them [52]. Thinking in terms of PDEs, the exogeneous inputs resemble time-dependent boundary conditions, and the feedback provides access to preceding time steps of the PDE solution. With deeper recurrence, even integro-differential equations can be resembled, which is important for hysteretic systems or for system descriptions on larger scales.
There are two different methods to train NARX, namely Series-Parallel (SP, also known as open-loop) and Parallel (P, also known as closed-loop). In SP training, each time step in the time series is used as an independent training example. This means the recurrency in the ANN structure is ignored, and the preceding data values from the time series are provided as feedback inputs instead of the predicted values. The feedback loop is closed only after completing the training to perform multistep ahead predictions [52,53]. The independency of the training examples makes the training much easier; however, the trained network performs much worse after closing the feedback loop [52].
Most, if not all studies conducted with NARX have used SP structure to train the network. In this paper, we argue that P training resembles the dynamic system better. The reason is that P training optimizes the ANN exactly for the later prediction purpose over longer time horizons: it accounts for error propagation over time and for the time-dependency of the predictions between time steps. As a downside, it requires more time to train the network in P mode. We are readily willing to accept this trade-off, because once trained, the network can still calculate its outputs in high speed [54]. We also propose to use a deeper recurrency to train the network by feeding back predictions of multiple preceding time steps. This accounts for the nonlinearity of the system and for possible higher-order memory effects in the system. In terms of PDE-governed systems, this corresponds to the time delay between system excitations at one system boundary and the system’s reaction at a remote boundary.
Regularisation in training ANNs is useful to prevent overfitting. Here, as an addition to the commonly used L2 regularisation, simple regularisation terms are added that align with the physics. Several examples include, but are not limited to, monotonicity and non-negative values (examples in this case are volume and mole fractions) of the internal states. This follows the works presented in [46,55]. We suggest to use Bayesian Regularisation (BR) to optimally calculate the hyperparameters (normalising constants) of all terms in the loss function, unlike in previous works, where the hyperparameters were calibrated manually. Furthermore, regularisation terms with discretized balance equations are also used. This regularisation is a way to feed the training with fundamental human knowledge previously used to build PDEs. It helps the network realise the extensive relationships between inputs (previous states and boundary conditions) and outputs (future states of the system) in complex problems and to prevent physically implausible predictions.
To test our ANN framework, a Monte Carlo ensemble of numerically simulated time series of system states is used, which we generate from random samples of uncertain system parameters. White noise is then added to the simulation results to emulate the actual noisy measurement data. Then, this ensemble (both parameters and time series) is used to train the network. We use synthetic data instead of experimental data because the former allow more exhaustive and controlled testing; this does not imply that our main purpose is only surrogate modelling. As optimization algorithm for training, the Levenberg–Marquardt (LM) algorithm [56,57,58] is implemented to obtain an optimum set of NARX parameters (which consist of the so-called weights and biases).
This paper is organised as follows: in Section 2 we introduce the governing equations used for numerical simulation of the TCES internal states, the alignment between the dynamic of CaO/Ca(OH)2 and the NARX structure, as well as how we implement the physical knowledge into the regularisation. In Section 3, we discuss the results of our test, and Section 4 concludes the findings in the work.
2. Materials and Methods 2.1. Governing Equations This study serves as an initial step towards enabling a more complex analysis of the TCES system, focusing on predictions of the system’s dynamic internal states that change during the endothermic/exothermic reaction process. The analysis of the system’s integration with the energy source is out of scope of this paper.
To set up the prediction model, we consider the CaO/Ca(OH)2 TCES lab-scale reactor of 80 mm length along the flow direction as described in [20]. Assuming the system properties and parameters to be homogeneous, the simulation was conducted in 1D. The system was modelled as a nonisothermal single-phase multicomponent gas flow in porous media with chemical reaction acting as the source/sink terms and can be described using mole and energy balance equations. The inlet temperature and outlet pressure were fixed and defined with Dirichlet boundary conditions, and Neumann conditions were used to define the gas injection rates. The solid components forming the porous material are CaO and Ca(OH)2, and the gases are H2O and N2. The latter serves as an inert component to regulate the amount of H2O mole fraction in the injected gas. Full explanation in detail can be found in [20], and we offer a brief overview only in this section.
The mole balance equation was formulated for the solid component (subscript s) as:
∂ρn,s νs∂t=qs,
whereρndenotes the molar density,νthe volume fraction, q the source/sink term, t the time and the subscript n refers to molar properties. Note thatνsis the volume fraction of each solid component with regard to the full control volume, and therefore∑s νs=1−ϕ . In Equation (2), there is no effect of advection or diffusion (no fluxes) because the solid is assumed to be immobile. The change of solid component is solely caused by the chemical reaction through the reaction source/sink termqs, assuming the solid is immobile. The change in the gas component (subscript g), however, is affected both by advective and diffusive mass transfer and by a source/sink term for the reactive component H2O, as is defined in the mole balance equation:
∂ρn,g xgϕ∂t−∇.ρn,g xgKμ∇p+Dρn,g∇xg=qg.
Here, x denotes the molar fraction,ϕthe porosity, K the absolute permeability of the porous medium,μthe gas viscosity, p the pressure and D the effective diffusion coefficient.
The energy balance equation was formulated assuming local thermal equilibrium. It accounts for internal energy change of both solid and gas phase, convective and conductive heat flux as well as source/sink term from the reaction. It was defined as:
∂ρm,g ugϕ∂t+∑s∂(νs ρm,s cp,sT)∂t+∇.ρm,g hgKμ∇p−∇.(λeff∇T)=qe,
whereρmis the mass density,ugis the gas specific internal energy,cp,sis the specific heat capacity of the solid material (CaO and Ca(OH)2), T is temperature,hgis gas specific enthalpy andλeffis the average thermal conductivity of both solid materials and gas components.
Reaction rates must be specified to determine the source/sink term for each equation. Based on [20,21], simple reaction kinetics were used, described as:
ρ^m,SR=−xH2O(ρm,Ca(OH)2−ρm,SR)kRHT−TeqTeq,ifT<Teq,−(ρm,SR−ρm,CaO)kRDT−TeqTeq,ifT>Teq,
whereρ^m,SRis the mass reaction rate,kRHandkRDare hydration and dehydration reaction constant, respectively andTeqis the equilibrium temperature. Hydration process occurs whenT<Teq, which is also called the discharge process and is the exothermic part of the reaction; and dehydration process occurs whenT>Teq, also known as charge process and the endothermic part of the reaction. At the beginning of each reaction, the storage device is assumed to be in chemical equilibrium, corresponding toνCa(OH)2=0andνCaO=0for hydration and dehydration, respectively.
The relation between the reaction rate and the source/sink terms for the mole balance equations were defined as:
qH2O=qCaO=−qCa(OH)2=ρ^n,SR1−ϕ,
withρ^n,SRthe molar reaction rate (obtained fromρ^m,SRusing the molar mass of each respective component). The energy balance source/sink termqewas calculated accounting for the reaction enthalpyΔH and the volume expansion work [59] according to:
qe=−ρ^n,SRΔH−ϕ1−ϕpρn,g.
Note that a negative sign is necessary to calculateqe, so that its value is in proportion toqCa(OH)2, and in reverse toqH2OandqCaO. This negative sign can be explained by the fact that to form Ca(OH)2 from CaO and H2O in the hydration process, energy is released into the system. Correspondingly, a decrease in the molar amount of CaO and H2O (and an increase in the molar amount of Ca(OH)2) results in a positive source term. The opposite holds for the dehydration process.
2.2. Input and Output Variables
The numerical model used in this work was developed using DuMux (Distributed and Unified Numerics Environment for Multi-{Phase, Component, Scale, Physics,...} [60]). As input to the simulator, we need the material parameters such as CaO density (ρCaO), Ca(OH)2 density (ρCa(OH)2), CaO specific heat capacity (cp,CaO), Ca(OH)2 specific heat capacity (cp,Ca(OH)2), CaO thermal conductivity (λCaO) and Ca(OH)2 thermal conductivity (λCa(OH)2); porous medium parameters such as absolute permeability (K) and porosity (ϕ); reaction kinetics parameters such as reaction rate constant (kr) and specific reaction enthalpy (ΔH); and initial and boundary conditions such as N2 molar inflow rate (n˙N2,in), H2O molar inflow rate (n˙H2O,in), initial pressure (pinit), outlet pressure (pout), initial temperature (Tinit), inlet temperature (Tin) and initial H2O mole fraction (xH2O,init).
In the TCES system application, one of the main goals is to estimate the state of charge of the device that is implied in the CaO volume fractionνCaO. The device in fully charged condition corresponds toνCaO=1and vice versa. We are also interested in the output variables p, T andxg,H2O(H2O mole fraction). The behaviour of these variables, especially p, is very nonlinear. Therefore, it is interesting to see the prediction of the ANN for these nonlinear variables. Additionally, these variables are also important to assist in the system understanding. Therefore, our main output variables of interest were defined as in the following vectoryas a function of time t:
y(t)=p(t)T(t)νCaO(t)xg,H2O(t).
All input-output data samples are available as supplementary materials on https://doi.org/10.18419/darus-633.
2.3. Aligning the ANN Structure with Physical Knowledge of the System
ANN representation via NARX has two different training architectures, namely Series-Parallel (SP) and Parallel (P) structure. The network outputy^SPt+1of the SP structure is a function of the observed target values of previous time stepsytup to a feedback delaydyand of the so-called exogenous inputsu:
y^SPt+1=fyt,y(t−1),…,y(t−dy),u.
In this work,uwas assumed to be constant over time, meaning there is no disturbance signal throughout the whole simulation period.
In P structure, the difference lies in the fed-back values. Here, the network outputs of the P structurey^Ptare fed-back instead of the original given datay(t):
y^Pt+1=fy^t,y^(t−1),…,y^(t−dy),u.
Note that, in terms of notation, the difference only lies in the hats above the fed-back values. Apparently, the P-structure in NARX resembles an explicit time-discrete differential equation (ODE or PDE) in a simplistic case, for example using the Adams–Bashforth discretization scheme [61,62] which can be described as:
y^(t+1)≈y^(t)+Δt·g(u,y^(t),y^(t−1),…,y^(t−dy)).
wherey^t+1is an explicit function ofy^t…y^t−dy . In Equation (10), the NARX functionfy^t,y^(t−1),…,y^(t−dy),ucan be seen as an approximation ofy^(t)+Δt·g(u,y^(t),y^(t−1),…,y^(t−dy)) in Equation (11). Based on this reason, we propose to train using P-structure for solving dynamic problems whenever possible. Additionally, training in P-structure helps the network to learn that there is dependency between predicted values at different time steps. While both architectures were considered for NARX training, only P architecture was used for testing, as for longer-term forecasting, real data of previous time steps are not available [52]. For better understanding of the difference between P and SP, Figure 2 illustrates both architectures.
Feedback delay is also an important property, because Equations (2)–(4) are not elliptic, and hence there will be a time delay for the effect of input change to change the output. Because of this memory effect and its nonlinearity, we propose to use a deeper recurrence in NARX to enable the network to learn the system’s latency. In this work, feedback delay valuesdyranging from 1 to 5 were tested to get the optimum value.
Additional to an appropriate ODE-like structure, the hyperbolic tangent (tanh) function was chosen as activation function within the neurons of NARX. Tanh is a nonlinear activation function (named tansig function in MATLAB [63]). Aside from the nonlinearity, this choice was driven by the assumption that all the input parameters and the targets depend on each other via smooth functions (differentiable). This knowledge results, among others, from the presence of a diffusion term in all relevant transport equations, and from the absence of shock waves in the solutions to Equations (2)–(4). Hence, for each hidden layer l as shown in Figure 2, the layer outputa[l]is computed in the feedforward procedure described in:
a[l]=tanh(w[l] a[l−1]+b[l]).
For the output layer L, a linear function is assigned for scaling, leading to:
y^[L]=w[L] a[L−1]+b[L].
2.4. Physical Constraints in the Training Objective Function
As loss function for training, the most commonly used performance measure is a Mean Squared Error (MSE). Equation (14) shows the MSE for n training datasets (here, we use the subscript D for “data” to label the error termEDin the loss function as the data-related term):
ED=1n.nt∑i=1n∑t=1nt(yi,t−y^i,t)2,
wherei=1…nindicates a specific sample in the training dataset andt=1…nt indicates a specific time step. However, using MSE alone in the loss function is not enough most of the time. The optimization problem to be solved in training is typically an ill-posed problem in many instances [30]. Thus, regularisation is required to prevent overfitting.
In this work, the L2 regularisation method was used to increase the generalisation capability of the ANN [64]. L2 regularisation is also known as weight decay or ridge regression [65]. The goal of L2 regularisation is to force the network to have small parameter values (choosing the simpler network over the more complex one). This effectively adds a soft constraint to the loss function to prevent the network from blindly fitting the possible noise in training datasets:
Eθ=1N∑j=1Nθj2,
where N is the total number of network parameters (weights and biases), andθ∈RNare the network parameters.
To improve the network prediction and the physical plausibility even more, known physical laws were inserted as part of the network regularisation:
Ephy,k=1n.nt∑i=1n∑t=1nt(ephy,i,t,k)2,
where the subscript k identifies a specific physical law, for example a mole balance equation, andephy,i,t,k is the physical error listed in Table 1. For example, the termephy,i,t,1corresponds to the mole balance equation for dehydration/hydration. The mole balance equation used for this regularisation is the H2O mole balance, because it has the most complete storage, flux and source/sink term (the solid components are assumed to be immobile, and N2 is inert). The mole balance error can be written as:
eMB(i,t)=nH2O,out(i,t)−nH2O,in+ΔnH2O,sto(i,t)−ΔnH2O,q(i,t),
wherenH2Ois the molar amount of H2O, the subscriptout,in,stoand q denote outflow, inflow, storage and source/sink term, respectively. The mole balance error was used as a contraintephy,i,t,1and is equal to 0 if the mole balance is fulfilled. Putting this equation as a regularisation term penalises the network if the mole balance is not satisfied. Similarly, the corresponding energy balance equation also has to be fulfilled:
eEB(i,t)=Qout(i,t)−Qin+ΔQsto(i,t)−ΔQq(i,t),
where Q is the energy in the system. It was used as a regularisation inephy,i,t,2. A more detailed derivation of the mole balance errorephy,i,t,1and the energy balance errorephy,i,t,2 can be found in Appendix B.
Further relations of the formF(y^)≤0(monotonicity and non-negative values) were implemented using the Rectified Linear Units (ReLU) function, so that the physical error was then calculated withephy,i,t,k=ReLUF(y^) [46]. The ReLU function returns 0 as output value for negative arguments and linearly increases for positive arguments. Hence, it punishes positive values in proportion to their magnitudes. Examples of these ReLU constraints areephy,i,t,3throughephy,i,t,13. They define non-negativity and monotonicity of the predicted target variables. For both dehydration and hydration process, negative fractional valuesν^CaOandx^H2Oare physically and mathematically impossible. Therefore, inephy,i,t,3andephy,i,t,4, the network is punished for predicting negative values for these targets. Additionally, for both processes,ephy,i,t,5provides an additional constraint forν^CaO, to limit the amount of CaO volume in relation with the porosity (ν^CaO≤1−ϕ). All these monotonicity assumptions originated from the fact that the system’s material parameters are considered to be constant throughout operation of the system. Therefore, the system’s behaviour should be monotonic and bounded in the specified aspects. Specific for the dehydration process,p^,T^andν^CaOare expected to not decrease throughout the simulation. This results in the corresponding monotonicity constraintsephy,i,t,6toephy,i,t,8. The system temperature must also be lower or equal to the injected temperature as constrained inephy,i,t,9, because the injected temperature is higher than the initial temperature. Specifically for the hydration process, the monotonicity constraintsephy,i,t,6toephy,i,t,9for the dehydration process are reversed, because the hydration process is the reverse of the dehydration process.
2.5. Obtaining Optimum Network Parameters
The complete loss function defined in Section 2.4 including MSE and all the regularisation term is written as:
L(θ)=αEθ+βED+∑kλk Ephy,k.
Here,αandβare normalising constants ofEθandED, respectively; andλkis a normalising constant for each physical regularisation k. All error and regularisation terms, therefore, are evaluated in a normalised metric. These normalising constants, also known as the hyperparameters, determine the importance given to each term. For example, a highβmeans that it is more important for the network to fit the training datasets than to generalise better. In many cases, the hyperparameters are determined manually from trial-and-error. In this work, Bayesian Regularisation was adopted to calculate them overall using a maximum likelihood approach to minimiseL(θ) [66,67]. Bayesian Regularisation reduces the subjectivity arising from manual choice of hyperparameters. First, all the hyperparametersα,βandλkalong with the network parametersθwere initialised. The hyperparameters were initialised by settingβ=1,α=0 [68] and alsoλk=0 , while the network parameters were initialised using the Nguyen-Widrow method [69,70]. The Nguyen-Widrow method initialises the network parameters so that each neuron contributes to a certain interval of the whole output range (added with some random values).
The complete derivation for updatingαandβ can be found in [66,67]. Here, we only give the calculation ofλk. After each iteration, they are updated according to the following relation:
λk:=N−λkTrace(H−1 Jphy,kT Jphy,k)2Ephy,k,
whereJphy,kis the Jacobian of physical errorEphy,kwith respect to the network parametersθ. The approximate Hessian matrixHof the overall loss functionL(θ)was defined as follows:
H=αI+βJDT JD+∑kλk Jphy,kT Jphy,k,
whereJDis the Jacobian of the MSE (ED) with respect to the network parameters andIis theN×Nidentity matrix (N is the number of network parametersθ).
The network parameters are also updated after each iteration according to the Levenberg–Marquardt algorithm:
θ:=θ−(H+Iμ)−1(αθ+βJD ED+∑kλk Jphy,k Ephy,k),
whereμ>0 is the algorithm’s damping parameter. Its value is increased when an iteration step is not successful, and is decreased otherwise. The Levenberg–Marquardt algorithm was chosen because of its faster convergence rate compared to Steepest Decent and higher stability relative to the Gauss–Newton algorithm [58]. In absence of our physical regularisation terms and with fixedα,β , the procedure would simplify to a plain (nonlinear) least squares training, which would be the standard approach for training ANNs. Values of the trained network parameters and the normalising constants at the end of the training are given as supplementary materials on https://doi.org/10.18419/darus-634.
3. Results and Discussion
Our hypothesis is that applying physical knowledge of the modelled system into the construction of ANNs would lead to an improved physical plausibility of the prediction results. In this section, the prediction of the TCES system using ANNs is assessed and three relevant aspects that support our hypothesis are discussed: (I) the effect of feedback delay on the prediction result to account for the system’s nonlinearity and long-term memory effect (Section 3.1), (II) the comparison between training in SP and P architecture (Section 3.2) and (III) the improved physical plausibility from using physical regularisation (Section 3.3). The results are illustrated only for the dehydration process, because the hydration provides very similar results.
The complete workflow of the ANN application is shown in Figure 3. In general, the workflow can be divided into: training, validation and testing of the ANN. To train the ANN, first an ensemble of exogeneous inputu was generated based on selected probability distributions. These distributions are based on different values used in literature [6,17,18,19,20,21]. The complete list of exogeneous inputsu with their corresponding distributions is listed in Appendix A. This ensemble was then plugged into the numerical model in DuMux and was simulated untilt=5000s to obtain an ensemble of target datay(t) . The governing equations are provided in Section 2.1. White noise was then added to these targets by generating normally distributed random numbers with zero mean and a standard deviation of 0.05 times the target values. Lastly, both exogenous inputs and targets were normalised to the range [−1, 1] to help the stability of the training [71]. Then, we set up the NARX ANN as described in Section 2. The training was then conducted using the built-in functions for NARX in the MATLAB Neural Network Toolbox [63], in which the loss function calculation was modified based on the equations provided in Section 2.5. It was conducted in batch mode both for dehydration and hydration process with a total of 100 training datasets.
Without physical regularisation, we obtain the lowest MSE value when the NARX is trained using 1000 training datasets, as shown in Figure 4. However, it is interesting to see how the physical knowledge can further improve the performance of NARX with limited training data. Therefore, we conducted the training in batch mode both for dehydration and hydration process with a total of only 100 training datasets.
For conciseness, the choice of the number of hidden layers, number of nodes per hidden layer and choice of activation function is not discussed because there is no existing uniform and systematic method to calculate the appropriate or the best combination [72]. Based on trial and error, we found that for this specific problem, 2 hidden layers with 15 and 8 nodes at each layer gives reasonable results. An example of ANN prediction using this configuration is provided in Appendix C. The stopping criteria are the dampening factorμ>1010 (see Equation (22)) or the loss function gradientg=∂L(θ)∂θ<10−7, both of which are the default values proposed in the toolbox. Additionally, a maximum epochs is set for the training. Since training error converges mostly before 500 epochs, this limit is sufficient.
Different initialization values often lead to different network response as the algorithm might fail to always locate the global minimum [65]. Therefore, the network was retrained with 50 different initialisations. This number was set to find a compromise between a reliable result and a reasonable computational time. After each training, the network was validated on 20 validation datasets, and the trained network with the lowest MSE (EDevaluated against validation data) was selected. Finally, the network was tested with data contained neither in the training nor in the validation set on a test set with 800 time series.
3.1. Influence of Feedback Delay
Figure 5 shows the influence of feedback delay on the MSE evaluated on both the training (dashed lines) and test datasets (solid lines) for networks trained using P structure. As shown in Figure 5, fordy>1the test MSE is lower than the training MSE. This is generally because the network was trained using additional white noise—producing more errors in the training, while the test datasets were smooth for reference.
In Figure 5, while the training error seems to remain constant atdy>2, the test error keeps decreasing with increasingdy. This clearly illustrates that including more depth in the recurrence improves the generalisation capability and therefore improves the ANN prediction. As the bestMSEtestwas obtained withdy=5 , this value will be used from here onwards. From Figure 5 we can also analyse that a time delay of at least 3 previous time steps is useful to train the network. Moreover, we do not see the value of usingdy>5 , as judging by the MSE trend in Figure 5, there is no significant improvement expected.
3.2. SP Versus P Training Structure
Figure 6 compares the training time of P compared to SP structure. Moreover, plotting the gradient and performance as function of training epochs allows us to analyse the difference between both training characteristics. As expected and shown in Figure 6 (dashed lines), both SP and P structure training time increases nearly linearly with the number of epochs (iterations). However, the slope is steeper for P training which is caused by higher computational cost using Backpropagation Through Time (BPTT).
In Recurrent ANNs, BPTT is used to calculate the derivative of the loss function instead of the normal backpropagation method. BPTT is technically the same as the normal backpropagation method but with the RNN unfolded through time being the main difference [30]. The gradient is then backpropagated through this unfolded network. Unfolding the recurrent network increases its size, and therefore the optimization problem becomes computationally more expensive and more difficult to solve.
After every epoch in P training, the output values change, and consequently the feedback values also change. The constantly changing feedback values cause additional changes of the gradient values along the iteration (dotted lines in Figure 6). This makes the training a more nonlinear problem. Correspondingly, the training performance (smaller MSE) increases much slower. In SP training, the gradient strongly decreases during the first 20 epochs, showing that the SP training is more computationally stable. However, the MSE (solid lines in Figure 6) was evaluated during training for the structure the network was trained with, meaning that the training performance does not consider the closed-loop conversion error for SP training. For that reason, the MSE values shown in Figure 6 seem to be better for SP training. Regardless of this difference, both training procedures converge with strictly monotonic decay of their MSE.
Next, the prediction performances of both training architectures are compared. In Figure 7, the results of the SP-trained NARX (dashed lines) are shown compared to the target values obtained from the simulation (solid lines) as reference solutions. Here, all target variables are calculated with differing inlet temperature. After a few time steps with relatively precise forecasts, the NARX predictions for T, p andνCaO diverge from real values and are highly fluctuating over time, which is nonphysical. Note that, in Figure 7, the results are shown only up to time step 100 instead of 1000. This is because the NARX prediction results after time step 100 have even higher fluctuations as the error propagates, hence making the comparison unclear visually. The forecast forxg,H2Ois reasonably accurate fort<100 but more erroneous for longer forecast periods. The forecast error is caused by the closed-loop-instability, meaning the inaccuracy caused by converting the network structure from SP to P. In other words, training using SP structure gives increasingly erroneous results with increasing time horizon. On the contrary, training with P structure provides clearly more accurate forecasts compared with SP, as shown in Figure 8. The NARX predictions (dotted lines) correspond really well to the reference targets (blue solid lines) throughout the whole simulation time for 1000 time steps, with inlet temperatures covering the whole range of its input distribution.
The comparison of P and SP structure shows that, while training in P structure seems to be more unstable, it provides significantly better long-term predictions because it trains the network to realise the time-dependency of output variables in a dynamic system. As shown by Equation (11), NARX resembles an explicitly discretized ODE, which is known to be conditionally stable. In cases where the discrete ODE is unstable, the error grows exponentially through time [73]. By training the network using P structure, the same structure is used for both training and testing, hence minimising the error propagation. The higher computing time needed to train in P structure (in comparison with training in SP structure) should not be a problem because the training only needs to be conducted once. Once the optimum network parameters are obtained, NARX can give reasonably accurate predictions with fast computational time. In fact, almost all studies in the area of surrogate models are willing to accept high computational costs during training (called offline costs) if the accuracy and speed for prediction (online) are good [54,74,75].
3.3. Physical Regularisation Improves Plausibility To test the benefit of physically-based regularisation, the ANN performance is compared between training the network using:
-
only MSE as the loss function (“MSE”), that isL(θ)=ED,
-
MSE combined with only L2 (“MSE + L2”), that isL(θ)=αEθ+βED,
-
MSE combined with only physical regularisation (“MSE + PHY”), that isL(θ)=βED+∑k λk Ephy,k, and
-
MSE combined with both L2 and physical regularisation (“MSE + L2 + PHY”), that isL(θ)=αEθ+βED+∑k λk Ephy,k.
To make a fair comparison, all networks were trained with the same set of initial network parameters. In this test, we used only P training and feedback delaydy=5 , because they clearly showed preferable performance in the previous sections. The comparison is summarised in Table 2.
While the MSE on training data is in a comparable range for all loss functions, differences inMSEtestare observed. L2 regularisation helps to reduce overfitting of training data, resulting in a lower test error in “MSE + L2” compared to “MSE”. At first glance, the additional physical regularisation does not seem to further improve the results.MSEtestof “MSE + PHY” is slightly worse compared to “MSE”, andMSEtestof “MSE + L2 + PHY” is in the same order with “MSE + L2” because another constraint is added in the objective function, while the performance is measured only based on MSE. Moreover, using only the physical (MSE + PHY) instead of only L2 regularisation (MSE + L2) leads to a test performance decrease.
Even though the performance for both “MSE” and “MSE + L2” are better than “MSE + PHY”, they both fail to produce physically plausible predictions in several test datasets (outliers) as shown in Figure 9 and Figure 10 (the label “Reference” for the blue line refers to the synthetic test data obtained from the physical model), the clearest one being negative fraction values of CaO and H2O. One important aspect that needs to be considered is that the ANN was trained using only 100 training datasets, compared to almost 500 parameters that exist inside the network. This made the optimisation problem an ill-posed one, leading to clear overfitting in the network with “MSE” and “MSE + L2”. Physical regularisation tackles this problem even for relatively sparse training data, which is valuable once experiments are costly, and therefore, not much data are often available to train the network.
Even though it produces the worst overall test performance, physical regularisation alone (MSE + PHY) is able to produce physically plausible results despite no application of L2 regularisation, see Figure 11. The figure illustrates the worst prediction result of all test sets obtained using “MSE + PHY”. Even in its worst prediction with high error, the network is much more stable. With the addition of L2 regularisation in the “MSE + L2 + PHY” scheme, the prediction error (MSE) is further reduced so that it lies within the same range as “MSE + L2”. The major difference here is illustrated in Figure 12, where the worst prediction result produced by “MSE + L2 + PHY” is far more physically plausible, shown by the absence of unstable fluctuations as well as the relatively higher accuracy.
We trained the ANN using numerical simulation results which indirectly imbues the physics from the formulated governing equations into the ANN. When the ANN was not trained using numerical simulation results but with real observation data (which could follow more complex, scientifically unexplored equations), physical regularisation helps to constrain the ANN training at least to fundamental, confirmed laws and prevent unnecessary overfitting to the irregular and noisy observation data. As such, implementation of the method we present will be even more beneficial for applications with real observation data. 4. Conclusions We adopted a PINN framework as an example of grey-box modelling to predict the dynamic internal states of the TCES system. Our approach aligns with the motivation of PINN that sees the modelled system as a complex unit that is underrepresented by the governing equations used in the physical model. We do not construct the ANN only as a surrogate model for the expensive numerical model, unlike in other PINN approaches that use ANN to infer the governing equations of the modelled system or the parameter values, assuming that the physical model describes the system perfectly. Our method was designed for application with real observation data that imply more complex processes than the simplified physical model. In this paper, however, we used synthetic data to train the ANN as a proof of concept. As a key contribution, we propose to implement physical knowledge about a system into building the structure, choosing the training mode and designing the regularisation of ANNs to assure the physical plausibility and to increase the performance of the TCES dynamic internal state predictions. The alignment between the system’s behaviour (dynamic and nonlinear) and the ANN structure is described. The ANN predictions using different regularisation strategies are also compared to show the improvement provided by our method. We show that, while training in P structure is computationally more expensive and unstable, the result is superior to training using SP structure, because P training resembles the dynamic of the governing differential equations better. Additionally, we found that, due to the nonlinearity and long-term memory effects implied by the system equations, deeper recurrency is necessary. A moderate depth of feedback delay produces better prediction performance, resulting from the network ability to capture the latency of the system. However, using too much feedback delay is also counter-productive, as it does not give significant improvement anymore, only increasing computational cost. We also show that including physical regularisation to train the network improves the physical plausibility of the network predictions, even for worst-case scenarios. Physical regularisation helps the network to learn about relationships between different input and target variables, as well as the time-dependency between them. This includes mole and energy balance equations that serve as the building blocks of the system’s behaviour, along with simple monotonicity and non-negative constraints. However, physical regularisation alone is not enough to improve the generalisation capability of the network, and therefore, L2 regularisation is also necessary.
A very common issue with using ANNs in data-driven modelling is that obtaining experimental or operational data can be very costly, and therefore, there is often no sufficient data available to train the ANN. Our work shows that even with only a relatively small amount of training data (compared to the number of network parameters), using P training with a moderate amount of feedback delaydy, combined with physical regularisation helps to prevent overfitting in optimising ill-posed problems and it produces relatively accurate and physically plausible predictions of the CaO/Ca(OH)2 TCES system internal states. Further work is required for more sophisticated analysis of the system, for example with spatial distribution of the internal system, dynamic exogeneous input and uncertainty quantification of the predictions.
k | Equation:ephy,i,t,k=… | |
---|---|---|
Dehydration | Hydration | |
1 | eMB(i,t) | |
2 | eEB(i,t) | |
3 | ReLU(−ν^CaO(i,t)) | |
4 | ReLU(−x^H2O(i,t)) | |
5 | ReLU(ϕ+ν^CaO(i,t)−1) | |
6 | ReLU(p^(i,t−1)−p^(i,t)) | ReLU(p^(i,t)−p^(i,t−1)) |
7 | ReLU(T^(i,t−1)−T^(i,t)) | ReLU(T^(i,t)−T^(i,t−1)) |
8 | ReLU(ν^CaO(i,t−1)−ν^CaO(i,t)) | ReLU(ν^CaO(i,t)−ν^CaO(i,t−1)) |
9 | ReLU(T^(i,t)−Tin) | ReLU(Tin−T^(i,t)) |
Loss Function | MSEtrain | MSEtest |
---|---|---|
MSE | 8.45×10−3 | 2.81×10−3 |
MSE + L2 | 9.01×10−3 | 3.96×10−4 |
MSE + PHY | 8.68×10−3 | 3.83×10−3 |
MSE + L2 + PHY | 8.43×10−3 | 3.96×10−4 |
Availability of Data and Materials
Input-output data pairs used to train, validate, and test the ANN are available as supplementary materials on https://doi.org/10.18419/darus-633, while the trained network parameters (weight and bias values) for different regularisation methods explained in Section 3.3 are also available on https://doi.org/10.18419/darus-634. Each dataset is accompanied with a 'README' file that explains the data format.
Author Contributions
Conceptualisation, T.P., T.W., S.O. and W.N.; methodology, T.P., T.W., S.O. and W.N.; software, T.P. and T.W.; validation, T.P. and T.W.; formal analysis, T.P., T.W., S.O. and W.N.; investigation, T.P. and T.W.; resources, T.P. and T.W.; data curation, T.P.; writing-original draft preparation, T.P. and T.W.; writing-review and editing, S.O. and W.N.; visualisation, T.P. and T.W.; supervision, T.P., S.O. and W.N.; project administration, S.O. and W.N.; funding acquisition, S.O. and W.N. All authors have read and agreed to the published version of the manuscript.
Funding
This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy-EXC-2075-390740016.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
Abbreviations: | |
ANN | Artificial Neural Network |
BR | Bayesian Regularisation |
LM | Levenberg Marquardt |
MSE | Mean Squared Error |
NARX | Nonlinear Autoregressive Network with Exogeneous Inputs |
ODE | Ordinary Differential Equation |
P | Parallel (network structure) |
PDE | Partial Differential Equation |
PINN | Physics Inspired Neural Network |
RNN | Recurrent Neural Network |
SP | Series Parallel (network structure) |
TCES | Thermochemical Energy Storage |
TCES-related parameters: | |
ΔH | Reaction enthalpy |
λeff | Average thermal conductivity |
μ | Viscosity |
νs | Solid volume fraction |
ϕ | Porosity |
ρm | Mass density |
ρn | Molar density |
t | Time |
cp | Specific heat capacity |
D | Effective diffusion coefficient |
h | Specific enthalpy |
K | Permeability |
kR | Reaction constant |
p | Pressure |
q | Source/sink term |
T | Temperature |
u | Specific internal energy |
xg | Gas molar fraction |
ANN-related parameters: | |
α | Normalising constant of L2 regularisation term |
β | Normalising constant of data-related error |
H | Hessian matrix |
I | Identity matrix |
J | Jacobian matrix |
y^(t) | Predicted value at time t |
λ | Normalising constant of physical error |
μ | Damping parameter for LM algorithm |
θ | Network parameter |
b | Network bias |
dy | Feedback delay |
ED | Data-related error |
Eθ | Mean squared value of network parameters |
Ephy | Physical error |
Lθ | Loss function |
N | Number of network parameters |
n | Number of training samples |
nt | Number of time steps |
u | Exogeneous input |
w | Network weight |
y(t) | Observed value at time t |
Appendix A. List of Exogeneous Input and Its Distribution
Table A1 lists all of the exogeneous input with their corresponding distributions. The exogeneous input distributions are centred around the values taken from [20].
Table
Table A1.Input distributions for exogenous inputs, withμandσbeing the mean and standard deviation used to generate the data, respectively; while the superscript D and H refer to the dehydration and hydration process, respectively.
Table A1.Input distributions for exogenous inputs, withμandσbeing the mean and standard deviation used to generate the data, respectively; while the superscript D and H refer to the dehydration and hydration process, respectively.
Exogenous inputs with normal distribution | |||||
u | Unit | μD | σD | μH | σH |
ρCaO | kg/m3 | 1656 | 25 | 1656 | 25 |
ρCa(OH)2 | kg/m3 | 2200 | 25 | 2200 | 25 |
pinit,pout | Pa | 1 × 101 | 2.3×103 | 2×105 | 2.3×103 |
Tinit | K | 573.15 | 20 | 773.15 | 20 |
Tin | K | 773.15 | 20 | 573.15 | 20 |
n˙N2,in | mol/s.m | 4.632 | 0.25 | 2.04 | 0.15 |
n˙H2O,in | mol/s.m | 0.072 | 0.01 | 1.782 | 0.15 |
Exogenous inputs with lognormal distribution | |||||
u | Unit | μD | σD | μH | σH |
K | mD | log(5×103) | 0.525 | log(5×103) | 0.525 |
kR | - | log(0.05) | 0.5 | log(0.2) | 0.5 |
Exogenous inputs with shifted and scaled beta distribution | |||||
u | Unit | a | b | scale | shift |
cp,CaO | J/kg.K | 7.1 | 2.9 | 300 | 700 |
cp,Ca(OH)2 | J/kg.K | 7.6 | 2.4 | 350 | 1250 |
λCaO | W/m.K | 6.5 | 3.5 | 0.6 | - |
λCa(OH)2 | W/m.K | 6.5 | 3.5 | 0.6 | - |
ϕ | - | 8.5 | 1.5 | 0.825 | - |
ΔH | J/mol | 4.8 | 5.2 | 3×104 | 9×104 |
xH2O,init | - | 76 | 85 | - | - |
Appendix B. Mole and Energy Balance Error
Physical constraints inephy,i,t,1andephy,i,t,2use mole and energy balance equation, respectively. Both balances are calculated in a simplified way discretized for time steps of 5 s with spatially averaged values and a local thermal equilibrium of the gas and the solid. For clarity, all the specific sample indicators in the training dataset i are omitted in this section.
The mole balance is formulated for H2O (assuming that the density can be calculated with ideal gas law) with the in- and outflowing molesnH20,inandnH20,out, the storage term in the gaseous phaseΔnH20,stoand the source/sink termΔnH20,q . The in- and outflowing moles of H2O both are known values from the simulation or input data. The storage termΔnH20,sto can be calculated from the change in H2O molar fraction,x^g,H2O(t)-x^g,H2O(t-1) multiplied with the H2O molar density and the pore volume. The complete definition is written as:
ΔnH2O,sto(t)=ϕV(x^g,H2O(t)-x^g,H2O(t-1))ρn,H2O(t).
The source/sink termΔnH20,q is calculated with the molar amount of CaO formed. Based on the stoichiometry ratio, with every 1 mole of CaO forme, 1 mole of H2O is also formed. The molar amount of CaO is determined by the change in CaO volume fraction,ν^CaO(t)-ν^CaO(t-1), multiplied with the molar density and the volume. The calculation forΔnH20,qis written as:
ΔnH2O,q(t)=V(ν^CaO(t)-ν^CaO(t-1))ρn,CaO.
Finally, Equations (A1) and (A2) are substituted into Equation (17).
For the energy balance formulation, the energy of the inflowing and outflowing gasQinandQoutare also known from simulation or from input data. The energy storage in the gaseous phase is neglected as its contribution is negligible. Only the solid contribution is used in the calculation ofΔQsto . The solid energy change is calculated as the change in both CaO and Ca(OH)2 mass multiplied by the temperature and specific heat capacity. The definition is written as:
ΔQsto(t)=Qsto(t)-Qsto(t-1),
whereQsto(t)is defined as:
Qsto(t)=V[ν^CaO(t)cp,CaO.ρm,CaO+(1-ϕ-ν^CaO(t))cp,Ca(OH)2.ρm,Ca(OH)2]T^(t).
The source/sink term for the energy balance equation,ΔQq , is calculated based on the change in molar amount of CaO multiplied by the specific reaction enthalpy and subtracted with the volume expansion work. The negative sign corresponds to the definition in Equation (7). The calculation is written as:
ΔQq(t)=-Vν^CaO(t)-ν^CaO(t-1)ρn,CaOΔH-ϕ1-ϕT^(t)R.
Equations (A3) and (A5) are then substituted into Equation (18).
Appendix C. Example of the Ann Prediction
Figure A1 shows the best prediction of the ANN using 2 hidden layers with 15 and 8 nodes at each layer using only MSE and L2 regularisation to define the loss function.
Energies 13 03873 g0a1 550
Figure A1.An example of the best prediction sample (red) obtained using 2 hidden layers with 15 and 8 nodes at each layer and reference solution obtained from the physical model (blue).
Figure A1.An example of the best prediction sample (red) obtained using 2 hidden layers with 15 and 8 nodes at each layer and reference solution obtained from the physical model (blue).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Thermochemical Energy Storage (TCES), specifically the calcium oxide (CaO)/calcium hydroxide (Ca(OH)2) system is a promising energy storage technology with relatively high energy density and low cost. However, the existing models available to predict the system’s internal states are computationally expensive. An accurate and real-time capable model is therefore still required to improve its operational control. In this work, we implement a Physics-Informed Neural Network (PINN) to predict the dynamics of the TCES internal state. Our proposed framework addresses three physical aspects to build the PINN: (1) we choose a Nonlinear Autoregressive Network with Exogeneous Inputs (NARX) with deeper recurrence to address the nonlinear latency; (2) we train the network in closed-loop to capture the long-term dynamics; and (3) we incorporate physical regularisation during its training, calculated based on discretized mole and energy balance equations. To train the network, we perform numerical simulations on an ensemble of system parameters to obtain synthetic data. Even though the suggested approach provides results with the error of3.96×10−4which is in the same range as the result without physical regularisation, it is superior compared to conventional Artificial Neural Network (ANN) strategies because it ensures physical plausibility of the predictions, even in a highly dynamic and nonlinear problem. Consequently, the suggested PINN can be further developed for more complicated analysis of the TCES system.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer