1 Introduction
Near-surface soil moisture is a critical variable for understanding land–atmosphere interactions. Its applications range from hydrological modeling and agricultural water management to the prediction of natural disasters . Near-surface soil moisture is also a dominant factor that regulates microbial activity and organic matter dynamics .
With the technological advancement of soil moisture sensors and remote sensing, the availability of soil moisture data is proliferating
Physical modeling of soil moisture is commonly conducted by solving a partial differential equation (PDE) called the Richardson–Richards equation (RRE) . Water transport properties in soils are expressed in the RRE as two relations, the water retention curve (WRC) and the hydraulic conductivity function (HCF), which are both highly nonlinear functions . This double nonlinearity makes it challenging to analyze and solve the RRE, which has been investigated by soil physicists and mathematicians for many decades . Indeed, the RRE is one of the most challenging PDEs in hydrology, and the large-scale soil moisture modeling based on the RRE has been prohibitive due to the computational demand and its unreliable solution .
Artificial neural networks (NNs) have attracted attention as an alternative numerical solver of PDEs recently. While there are some similarities between artificial NNs and biological NNs (i.e., our brain), in this context, artificial NNs should be considered as mathematical functions with specific architectures with many adjustable parameters. Thus, artificial NNs are simply referred to as NNs in this study. and mathematically proved that NNs can approximate any continuous function under certain conditions. The application of NNs to various fields relies on this so-called universal approximation capability to find functional relationships between available input data and target variables. Regardless of the tremendous implication of the universal approximation capability, it is not easy to find NNs that can approximate such functional relationships. Therefore, significant efforts have been devoted in the machine learning community to investigate how to adjust NN parameters (or train NNs) for specific problems.
The solution to PDEs is a functional relationship between dependent and independent variables. Therefore, it is straightforward to apply NNs to approximate the solution. were among the first to use NNs to approximate the solution to boundary value problems for PDEs. To train NNs, they defined a loss function so that NNs satisfy both PDEs and the corresponding boundary conditions and minimized the loss function to estimate the NN parameters. The minimization of the loss function required the computation of the gradient of the loss function with respect to the NN parameters as well as the partial derivatives of the PDEs. At that time, such gradients and partial derivatives were computed by manual derivation and programming. However, recent progress in the software environment (e.g., TensorFlow, , and PyTorch, ) implementing automatic differentiation enabled us to compute such gradients and partial derivatives in a scalable manner with advanced processors such as graphics processing units (GPUs). With the technological progress, the NN-based methods to solve PDEs were reformulated as physics-informed neural networks (PINNs) by . PINNs have been applied to both forward and inverse modeling of PDEs in various fields, such as incompressible flows , subsurface transport , and water dynamics in soils . provide a comprehensive review of PINN approaches.
PINNs are particularly promising for ill-posed forward and inverse modeling. Traditional numerical solvers, such as finite difference methods (FDMs) and finite element methods (FEMs), require well-defined initial and boundary conditions, but they are often incomplete (i.e., ill-posed). This is usually the case for simulating near-surface soil moisture in real-world field condition, where obtaining accurate initial and boundary conditions is not possible without costly instrumentation (e.g., lysimeters or flux towers)
A natural question on PINNs is whether PINNs can replace traditional numerical solvers. According to previous studies and our experiences, it is currently hard for PINNs to achieve the same accuracy as traditional methods in a competitive computational time. However, PINNs might be a competitive numerical solver for highly nonlinear PDEs (e.g., the RRE and the Navier–Stokes equation) in high dimensions, where traditional numerical solvers become computationally demanding. For example, when simulating water flow under infiltration based on the RRE using traditional numerical methods, the spatial mesh must be very small near the soil surface to obtain reliable solutions, which is computationally very expensive or intractable for a three-dimensional watershed scale. Although there is progress in numerical methods other than PINNs, including a discontinuous Galerkin method with adaptive spatial and temporal meshes , it is worthwhile to seek the potential of PINNs to solve the RRE.
Figure 1
(a) Schematic description of a soil consisting of two distinct layers with soil moisture sensors. (b) Physics-informed neural networks (PINNs) with domain decomposition for a two-layered soil. For each input and , physics and data constraints are computed through automatic differentiation. The neural network parameters and are estimated by minimizing the loss function . (c) Computation by a single unit. The input values (, , ) are summed with weights (, , ) and added by a bias term . The result is fed into the activation function . (d) The tanh function as an adaptive activation function with a fixed scaling parameter and a trainable slope parameter . The figure was inspired by .
[Figure omitted. See PDF]
In the study, we conducted a comprehensive analysis of PINNs as a forward and inverse numerical solver of the one-dimensional RRE for homogeneous and heterogeneous soils. This paper is a continuation of our previous work , where PINNs were used to estimate soil hydraulic properties from soil moisture measurements. Unlike the previous work, we investigated the ability of PINNs to solve the forward problem in addition to the inverse problem, and the framework was extended to heterogeneous soils. Because of the rapid progress in the field, it is impossible to test all the variants of PINNs and their training methods. Accordingly, we implemented some promising PINN methods, including layer-wise locally adaptive activation function (L-LAAF) and domain decomposition , where two NNs interact with each other through interface conditions to account for the discontinuity of hydraulic conductivity across layers (see Fig. ). To validate and evaluate the solutions derived from PINNs, we compared them to the analytical solutions given by and numerical solutions by an FDM and an FEM. The effects of the architecture of NNs and various parameters on the performance of PINNs were investigated. In addition to the forward modeling, we conducted inverse modeling using synthetic data, where a surface water flux upper-boundary condition was estimated from near-surface soil moisture measurements by PINNs. Finally, we discuss current challenges and future perspectives of PINNs for forward and inverse modeling of soil moisture dynamics based on the RRE.
2 Methods2.1 Richardson–Richards equation
We consider water transport in unsaturated isothermal rigid soils. In this study, hysteresis and vapor flow are ignored, and soil hydraulic properties are isotropic. The mass balance of water in a control volume implies the continuity equation:
1 where is the volumetric water content [L L], is the time [T], is the water flux in three dimensions [L T], and is the source term [T]. The water flux can be derived from the Buckingham–Darcy law : 2 where is the hydraulic conductivity [L T], and is the total water head [L], which is the sum of the water potential in soils [L] and the elevation head (positive upward). Equations () and () are combined to derive the Richardson–Richards equation (RRE) : 3 This form of the RRE is called the mixed-form RRE, where both the volumetric water content and the water potential appear in the equation. To solve the RRE, the two relationships (i.e., and ) need to be defined. The relationship is called the water retention curve (WRC), while is referred to as the hydraulic conductivity function (HCF). The WRC and HCF are called constitutive relationships of the RRE that characterize the movement of the water in the pore space. WRCs and HCFs are commonly expressed by parametric models such as the Brooks and Corey model , the van Genuchten–Mualem model , and the Kosugi model . In this study, the one-dimensional RRE without the source term is studied, which is written as 4 The zero source term means the neglect of plant water uptake, which is valid for bare soils or soil moisture dynamics under infiltration. The one-dimensional assumption can be reasonable because water flow in near-surface soils is predominantly vertical.
2.2 Analytical solutionsIt is difficult to obtain analytical solutions to the RRE because of the nonlinearity of the WRC and the HCF. In particular, analytical solutions to the RRE for layered soils are extremely scarce. provided one of a few analytical solutions to the transient one-dimensional RRE for both homogeneous and two-layered soils. The analytical solutions are based on the linearization of the RRE using the following relationships for WRCs and HCFs for : where is the residual water content [L L], is the saturated water content [L L], is the pore-size distribution parameter [L], and is the saturated hydraulic conductivity [L T]. Note that the parameter can be interpreted using van Genuchten parameters and , as . Although the parametric expressions for WRCs and HCFs as well as the parameter values used in the study do not necessarily represent hydraulic properties of real soils, the analytical solutions can serve to validate and assess the performance of PINN solutions to the RRE.
2.2.1 Homogeneous soil
The analytical solution for a homogeneous soil requires a set of initial and boundary conditions. The lower-boundary condition is a Dirichlet boundary condition at , where is the vertical length of the soil [L]. The initial condition is the steady-state solution of the RRE determined by the lower-boundary condition and a constant water flux upper-boundary condition at . The analytical solution to the time-dependent RRE with the initial and lower-boundary condition as well as a constant water flux upper-boundary condition at is written in terms of :
7 where , , ; , , and is the positive roots of the equation . The analytical solution with respect to the volumetric water content can be computed from through Eqs. and . The explicit analytical solution clarifies larger introduces stronger nonlinearity of the solution.
2.2.2 Heterogeneous soilprovided one-dimensional analytical solutions of heterogeneous soils (i.e., two-layered soil). The analytical solution is based on the assumption that is the same for both layers. Therefore, this analytical solution is limited to analyzing layered soils that have a discontinuity in the hydraulic conductivity across the layers. In fact, the volumetric water content is continuous across the layers as the water potential . The initial and boundary conditions are the same as the homogeneous case. The analytical solution is much more complicated than the homogeneous one, so we refer to the original literature for the detail . However, we provide the computed analytical solutions and numerical derivations on , which can be useful to validate other numerical methods.
2.3 Mathematical formulation of PINNs
2.3.1 Feedforward neural networks
Feedforward NNs are used to approximate the solution of PDEs in PINNs. In this section, the mathematical formulation of feedforward NNs with hidden layers with layer-wise locally adaptive activation functions (L-LAAFs) is introduced. NNs are mathematical functions mapping an input vector to an output vector :
8 The hat operator represents prediction throughout the paper. NNs are often represented as layers of units (or neurons), as in Fig. b, where two feedforward NNs consisting of four hidden layers with six units are shown.
NNs are compositions of affine transformations (the composition of linear tranformation and translation) and nonlinear functions. Herein, for an integer such that represents the vector value corresponding to the th hidden layer consisting of units. for each is computed in the following manner: 9 where and are the weight matrix and bias vector for the th hidden layer, is a fixed scaling factor, and represents a trainable parameter changing the shape of the element-wise activation function . The output of the NN is computed as 10 where is the output function, and and are the weight matrix and bias vector for the output layer. The collection of the weight matrices , …, }, the bias vectors , …, }, and the slope parameters , …, } are the parameters of the NN, which are denoted by in this paper.
To understand the role of the parameters and introduced in the L-LAAF , consider a case where for all , and is the identity function. In this case, the neural network is nothing but an affine transformation and cannot learn a nonlinear relationship. In a standard NN, nonlinear activation functions, such as the hyperbolic tangent function (tanh), are used with for all to learn nonlinear relationships between input and output variables. As shown in Fig. d, the tanh function has a “linear” regime near the origin and exhibits the nonlinearity outside the region. By increasing the parameter , we can increase the slope of the activation function and narrow the “linear” regime. reported that larger accelerated the training of PINNs and captured the high-frequency components of the solution of PDEs, while values that were too large made the training unstable. Throughout the study, was initialized to 0.05, and was set to 20 when the L-LAAF was used.
2.3.2 Formulation of PINNs for the RREIn this section, PINNs to solve the forward and inverse problems for the RRE are described. First, we consider the forward modeling of the one-dimensional RRE defined on a domain , the boundary , and the time [0, T): where is the initial condition, is the Dirichlet boundary condition on the Dirichlet boundary , is the water flux in the vertical direction (positive upward), and is the water flux boundary condition on the flux boundary . Here, we only use the volumetric water content for the initial condition and the Dirichlet boundary condition because the measurement of is more reliable than the water potential in practical situations, though the modification to the water potential is straightforward (see Sect. S1.5 in the Supplement). Although we limit ourselves to either the Dirichlet or the water flux boundary condition, the framework can be extended to other boundary conditions. In this study, we focus on a particular situation where the soil surface () is set to , and the bottom () is set to , which corresponds to when soil moisture dynamics is induced by the surface water flux (i.e., evaporation or infiltration) while the volumetric water content at the bottom is kept to .
PINNs aim to approximate the solution of the one-dimensional RRE by a NN with the NN parameters . Because the water potential is negative in unsaturated soils, we used the identity function for the output function of the NN (Eq. ) and transformed the output as
15 where is a fixed parameter, which can allow to be zero or positive (saturated). To construct PINNs for the RRE, the residual of the RRE is defined as 16 Here, and are computed from with the predefined WRC and HCF (i.e., and ). The partial derivatives in the residual are computed through the reverse-mode automatic differentiation . In this method, all the computations of PINNs are formulated as computational graphs by the software (TensorFlow in this study), and any derivatives related to the computations can be computed exactly, unlike numerical differentiation. This algorithm is highly efficient when the number of parameters is large and thus suitable for training neural networks with tens of thousands or millions of parameters. The collection of the NN parameters is identified by minimizing a loss function, which is defined as 17 where is the weight parameter corresponding to the loss term for the residual of the RRE , and is the weight parameter for the loss term for , where , , , and represent the measurement data, the initial condition, and the Dirichlet and the water flux boundary conditions, respectively. and for are defined as where , denotes the residual points (also called collocation points) at which the residual of the PDE is evaluated, , , denotes the measurement data points, denotes the initial condition points, , denotes the Dirichlet boundary condition points, and , denotes the water flux boundary condition points. Here, forces PINNs to satisfy the PDE, and helps PINNs to fit the measurement data, while , , and enforce the initial, the Dirichlet and the water flux boundary conditions, respectively. Note that initial and boundary conditions can be encoded in a hard manner so that the approximated solution automatically satisfies these conditions . However, we encode these conditions in a soft manner and treat them as data points as in Eq. () to leverage the flexibility of PINNs, which is essential in practical situations where precise initial and boundary conditions are rarely available.
In the framework, the measurement data can be provided, which is not necessary to make the forward modeling well-posed. PINNs can easily incorporate such additional measurement data to improve accuracy and computational efficiency. As for the inverse modeling, the implementation of the PINNs is almost identical to the forward modeling. If we invert physical parameters in PDEs from measurement data (e.g., saturated hydraulic conductivity ), the parameters and the NN parameters are simultaneously estimated. Also, one can drop the loss terms for the initial or boundary conditions if they are not available, though this makes the problem ill-posed. In the study, the forward and the inverse modeling was conducted in Sects. and , respectively.
2.3.3 Errors in PINN solutionsDespite the increasing popularity and successes of PINNs in various fields, the theoretical understanding of PINNs is still limited. were among the first who conducted a rigorous analysis of PINNs; they formulated the generalization error of PINNs as the sum of the approximation error, the optimization error, and the estimation error. The approximation error is the distance between the best possible approximation by PINNs and the solution of PDEs. The optimization error is due to the difficulty in minimizing the nonlinear and non-convex loss function. The estimation error is caused by the insufficiency of data to train PINNs. They demonstrated theoretically and numerically that the sum of the approximation and the estimation error decreased with the increase in the training data for linear second-order elliptic and parabolic PDEs.
provided a theoretical framework to estimate the generalization error of PINNs for a variety of PDEs, including nonlinear PDEs. They demonstrated that the generalization error would be sufficiently low if (1) PINNs are trained well (i.e., small optimization error), (2) the number of residual points is large, and (3) the solution of PDEs is sufficiently regular. In the study, we numerically analyzed the accuracy of PINN solutions to the RRE and investigated whether their theoretical claims could be applied to our case.
2.4 Implementation of PINNs
Training of NNs requires trial and error because the theoretical understanding of the mechanism is still limited. However, feedforward NNs used in PINNs have been investigated for many years, so empirical knowledge is available . Those techniques are not new, but we would like to reiterate some of them with the explanation of our implementation of PINNs. The PINN algorithm for heterogeneous soils is summarized in Fig. . We used TensorFlow to implement PINNs, and the source code is available in .
Figure 2
The algorithm for PINNs with domain decomposition for a two-layered soil. and refer to the spatial domain for the upper and lower layers, respectively. In Step 3, max_iteration_Adam is the maximum number of the Adam iteration (set to 100 000 in the study); represents the update for the neural network parameter ; L-BFGS-B stopping criteria are summarized in Sect. .
[Figure omitted. See PDF]
2.4.1 Architecture of neural networksBefore training NNs, it is recommended to transform input data so that the components of input variables have zero mean, and each variable has a similar variance. In our implementation, we did not transform or normalize input data (i.e., and ). However, when both input variables were positive, the training of PINNs was difficult, as mentioned in . Thus, it is better to transform input data for future studies. As for output variables, it is also important to take into account their range, as in Eq. ().
The architecture of NNs (i.e., the number of hidden layers and units for , …, ) determines the complexity of functions the NN can learn and thus depends on PDEs of interest and the corresponding initial and boundary conditions. It is known that the expressive capability of NNs grows exponentially with the number of hidden layers . We used the same number of units for all hidden layers, and the effects of the architecture of NNs on PINN performance were experimentally investigated. As for activation functions, symmetric activation functions with respect to the origin, such as the tanh function, are preferable to nonsymmetric functions such as the sigmoid function. Because PINNs require the second derivative of state variables in the loss function, we used the tanh function for the activation functions for all hidden layers.
2.4.2 Initialization
At the beginning of the training, the collection of the weight parameters and the bias parameters have to be initialized. demonstrated that simple random initialization caused the activation functions to be “saturated”, meaning that the slope of the activation function becomes zero. To prevent this, they introduced the Xavier initialization for the weight parameters , which was used in our implementation. The bias parameters were initialized to be 0. Because the initialization of significantly affects PINN solutions, different sets of randomization must be tested. Therefore, we used 10 different random seeds for the initialization of each setting of PINNs. In addition to and , we can tune the parameter in Eq. () to give different initializations of .
2.4.3 Training
PINNs were trained to minimize the loss function (Eq. ). The loss term for the residual was evaluated at randomly sampled residual points in the spatial and temporal domain (Fig. S1b in the Supplement). For each problem, the same residual points were used. We tested the residual-based adaptive refinement algorithm proposed by , where residual points are chosen where the residual of PDEs is high. For our case study, the effectiveness of the algorithm was minor, and thus the results are only shown in the Supplement (see Sect. S1.1 and Fig. S1).
The weight parameters in the loss function (Eq. ) play a crucial role in minimizing the loss function. In the original PINN framework proposed by , all were set to 1. However, demonstrated that the loss terms corresponding to initial and boundary conditions need to be penalized more, and optimal values of those weight parameters are problem-dependent. To overcome this challenge, they proposed the learning rate annealing algorithm , where the weight parameters in the loss function are updated during training to balance the relative importance of each loss term. We tested the algorithm, but this resulted in a modest improvement compared to the L-LAAF, so the results are given in the Supplement (Sect. S1.2 and Fig. S2). In the study, the effects of were investigated in Sect. .
It is common to use a stochastic gradient descent algorithm to minimize the loss function to train NNs. In this study, we used the Adam algorithm . Because the Adam optimizer is not enough to achieve solutions with high accuracy, previous studies on PINNs employed a two-step optimization strategy, and we followed it. First, the loss function was minimized using 10 iterations of the Adam algorithm in TensorFlow with the exponential decay of the learning rate. The initial learning rate was set to 0.001 with the decay rate of 0.90, the decay step was set to 1000, and the other parameters were set to their default values. The Adam algorithm used a “mini batch” of the data, where only 128 of all residual points were considered, while all the initial and boundary data points were used for each iteration. After the Adam algorithm, the loss function was further minimized through the L-BFGS-B optimizer from Scipy , which was terminated once the loss function converged with prescribed thresholds. The L-BFGS-B algorithm can utilize the information on the approximated curvature of the loss function and find a better local minimum than a stochastic gradient descent for the case of PINNs. The following L-BFGS-B parameters were used: maxcor 50, maxls 50, maxiter 50 000, maxfun 50 000, ftol , gtol , and the default values for the other parameters. Although it is desirable to tune the parameters for each optimizer, we fixed those parameters in the study.
2.4.4 Domain decomposition
Natural soils have distinctive layering, and the hydraulic properties of each layer vary between the layers, which results in continuous but not a differentiable water potential distribution in the soil profile. To deal with such spatial heterogeneity, proposed a domain decomposition method for PINNs, where a computational domain is divided into subdomains, and a NN is assigned to each subdomain. Then, the NNs interact with each other during the training through interface conditions such as the continuity of mass and flux. Such interface conditions can be incorporated into the loss function. For simulating water flow in a two-layered soil, two NNs and are assigned to the upper and lower layer, and the continuities of water potential, water flux, and the residual of the RRE are imposed at the boundary: where the subscripts “U” and “L” mean a value with the subscripts (e.g., ) was computed by and , respectively; represents the spatial coordinate of the interface. These interface conditions are incorporated into the loss function as a loss term (Eq. ): where is the number of points on the interface, where the loss terms are evaluated, and and are the neural network parameters for and , respectively. In the implementation, we found that the logarithmic transformation of water potential was helpful to balance the loss terms. Therefore, instead of , we imposed the continuity of the output of the neural networks, as in
29 Note that in the original literature, trained each NN separately by constructing multiple loss functions for parallel computation, but we trained the two NNs ( and ) simultaneously. The algorithm is summarized in Fig. . Although our algorithm is for a two-layered soil in one dimension, this method can be extended to more layers with more complex geometries in higher dimensions .
2.5 Evaluation of numerical errorNumerical errors of PINNs and the other numerical methods (i.e., FDMs and FEMs) in terms of the volumetric water content and the water potential were evaluated by comparing the analytical solutions to those numerical solutions computed on a uniform grid with a spatial step of 0.1 cm and temporal step of 0.1 h. Absolute error, relative absolute error, squared error, and relative squared error were computed. Because these different types of errors exhibited strong correlations, we show, in the following sections, relative squared error in terms of the volumetric water content , defined as
30 where and represent analytical and numerical solutions, respectively.
3 Forward modelingIn this section, we report the main results of the forward modeling of water transport in homogeneous and heterogeneous soils using PINNs. PINN solutions of the RRE with varying NN architectures and parameters were evaluated using the one-dimensional analytical solutions by for homogeneous and heterogeneous soils introduced in Sect. . To assess the performance of PINNs, numerical solutions obtained by an FDM and an FEM are also presented. The implementation of the FDM for the homogeneous case is described in Sect. S1.3, while the FEM solution for the heterogeneous case was obtained by HYDRUS-1D .
3.1 Homogeneous soil
We simulated water infiltration into a homogeneous soil using the RRE. This simple setup was used to understand the characteristics of PINNs with different settings. We specifically investigated (1) the effects of NN architecture (the number of layers and units as well as the use of the layer-wise locally adaptive activation function (L-LAAF)), (2) the effects of the weight parameters in the loss function (Eq. ), and (3) the effects of the number of the residual points and the upper-boundary data points.
3.1.1 Problem setup
We considered soil moisture dynamics in a homogeneous soil induced by a constant water flux on the surface ( cm), as introduced in Sect. , where cm, h, cm h, cm h, cm, , , cm, and cm h.
3.1.2 Characteristics of PINN solution
The numerical solution by PINNs and the analytical solution are shown in Fig. a. The PINN solution was obtained by using a NN of five hidden layers with 50 units using the L-LAAF, which was determined after testing various settings described in the following sections. initial data points (, , …, , 0.0 cm), upper-boundary data points (, 0.02, …, 9.99, 10.0 h), and lower-boundary data points (, 0.2, …, 9.9, 10.0 h) were used to train the NN. The number of residual points was set to 10 000. The weight parameters in the loss function were set to 10 for the initial condition (), the lower-boundary condition (), and the water flux upper-boundary condition (), while was set to 1 for the residual loss term. The difference between the PINN and the analytical solution in the volumetric water content is shown in the right column of Fig. a. Larger errors were observed near the initial and upper-boundary conditions, where strong nonlinearity exists due to the surface water flux. Except for this, PINNs could approximate the solution with high accuracy. Figure b showed the FDM solution with a spatial mesh cm and a time step h, which is comparable to the temporal resolution of the upper-boundary data points given to the PINN. In comparison with the FDM solution, the PINN solution was quite reasonable ( for the PINN and for the FDM solutions, respectively). Note that the degrees of freedom of the FDM solution were 101 000, while the number of parameters of the NN was 10 406, which demonstrates the memory efficiency of PINNs. Also, the number of residual points of PINNs was much smaller than the degrees of freedom of the FDM. However, increasing the number of residual points did not improve the PINN solution, as discussed in Sect. , while the FDM solution improved by further minimizing and ( was obtained for cm and h, as shown in Fig. S3). It is important to note here that an FDM with such a very fine mesh size requires solving a large linear system multiple times for each time step because the RRE is a nonlinear PDE, and the number of the iteration increases with decreasing , which leads to significant computational demand. This situation becomes worse for higher dimensions. Although we do not test PINNs for higher dimensions, other studies demonstrated the effectiveness of PINNs for higher dimensions . This is the main reason we see the potential of PINNs for a large-scale simulation based on the RRE.
Figure 3
Homogeneous soil. (a) Physics-informed neural network (PINN) solution in terms of volumetric water content [–] (left column). The neural network consisted of five hidden layers with 50 units with the layer-wise adaptive activation function. True analytical solution (center column) is given by (see Sect. ), and the difference between the PINN and true solutions is shown in the right column. (b) Numerical solution by a finite difference method with a spatial mesh of cm and a time step h.
[Figure omitted. See PDF]
3.1.3 Training PINNsA typical evolution of the loss terms in the loss function during the training (the Adam and L-BFGS-B algorithms) is shown in Fig. a. In most cases, the Adam algorithm gave a good minimum of the loss function, and the following L-BFGS-B algorithm met its termination criteria immediately (a spike was observed just before the termination). Among the loss terms, the residual loss remained high after the training (). indicates whether the RRE was satisfied in the spatial and temporal domain and determines the performance of PINNs. The characteristics of are further explored in Sect. . Figure b shows the evolution of the adaptive parameter for the L-LAAF for each hidden layer (Eq. ). The parameter changes the slope and the linear regime of activation functions, as shown in Fig. d. The parameter varied with the iterations of the Adam algorithm and reached its limiting value for each hidden layer. for the second hidden layer was the highest, and a smaller value was observed for hidden layers closer to the output layer.
Figure 4
Homogeneous soil. (a) The evolution of the loss terms for the initial condition , the upper-boundary condition , the lower-boundary condition , and the residual during the Adam (100 000 iterations) and the following L-BFGS-B training. (b) The evolution of the adaptive parameter for layer-wise locally adaptive activation functions (Eq. ) for each hidden layer (Layer 1 is next to the input layer).
[Figure omitted. See PDF]
Figure demonstrates how PINNs learn the solution to the RRE during the training. At the initialization (Fig. a), the PINN solution greatly differed from the true solution. However, PINNs quickly learned the lower-boundary condition (see Fig. b). Although the initial condition was given as data points, it took more iterations for PINNs to learn it because of the high nonlinearity near the surface induced by the water flux boundary condition. This corresponds to the increase in the adaptive parameter for the L-LAAF (see Fig. b). The limiting value of for the second hidden layer was 3.4, which makes the tanh function highly nonlinear and closer to the step function (see Fig. d). We concluded that the L-LAAF helped PINNs learn the highly nonlinear solution of the RRE. Figure clearly illustrated that PINNs first learned less complex parts of the solution and then captured the more complex parts. The tendency of feedforward NNs to learn less complex functions is called “spectral bias”, and demonstrated that this spectral bias caused PINNs to fail to learn complex solutions of PDEs. Further research is needed on how PINNs can learn more complex solutions of the RRE, for example, where wetting and drying cycles are studied.
Figure 5
Homogeneous soil. The evolution of physics-informed neural network (PINN) solution (points) during the training with the true solution (solid lines). (a) Initialization of PINNs. (b–g) 1000, 2000, 5000, 10 000, 20 000, and 40 000 iterations of the Adam algorithm. (h) The end of 100 000 iterations of the Adam algorithm and the following L-BFGS-B algorithm.
[Figure omitted. See PDF]
3.1.4 Effects of neural network architectureThe effects of the number of hidden layers and units of NNs as well as the use of the L-LAAF were investigated. The candidate numbers of hidden layers and units were 1, 2, 3, 4, 5, and 6 for hidden layers and 10, 20, 30, 40, 50, and 60 for units. The L-LAAF was turned on and off, and 10 different random seeds were used for the NN initialization for each NN architecture, which resulted in a total of 720 runs.
The summary of the results is shown in Fig. . In Fig. a and b, all the 720 data points are plotted, while the averaged data for each NN architecture are shown in Fig. c against the number of the weight parameters of each NN architecture. In Fig. a, smaller relative squared error values were observed for a larger number of hidden layers. Four hidden layers appeared to be enough for NNs to approximate the solution. As for the number of units, increasing the number gave smaller , though the effect seemed less relevant than that for hidden layers. It is clear from Fig. c that the use of the L-LAAF improved the performance of PINNs. From this analysis, we determined the best NN architecture to be five hidden layers with 50 units with the L-LAAF indicated by the arrow in Fig. c, whose solution is shown in Fig. .
Figure 6
Homogeneous soil. Relative squared error in terms of volumetric water content for different numbers of hidden layers (a) and units (b) with and without the use of the layer-wise locally adaptive activation function (L-LAAF). In panel (c), the averaged values were computed for each neural network architecture, and the values were plotted against the number of the weight parameters of each neural network. The arrow indicates the lowest averaged , which corresponds to neural networks with five hidden layers with 50 units used in Fig. .
[Figure omitted. See PDF]
3.1.5 Effects of weight parameters in loss functionThe effects of the weight parameters in the loss function (Eq. ) were studied by varying for the initial and boundary conditions, while was fixed to 1. We denote and as the weight parameters for the upper- and lower-boundary conditions, respectively. Five different values (1, 3, 10, 30, and 100) were tested for each weight parameter with 10 different NN initializations, which resulted in a total of 1250 simulations, where the NN architecture was fixed to five hidden layers with 50 units with the L-LAAF.
Figure shows the effects of the weight parameters for on the loss terms in the loss function and the PINN performance evaluated by the relative squared error . Each panel in the figure contains all the 1250 simulations, and the effects of the initialization and the are mixed. Figure a, e, and i demonstrated that higher-weight parameters attained lower values of the corresponding loss terms , which was expected. Another noticeable feature is that the loss term for the residual increased with the increasing , while was fixed to 1. It was considered that higher-weight parameters for the initial and boundary conditions minimized the emphasis on the residual loss term (see Fig. k and j). The increased led to less accurate solutions or higher , which is evident in Fig. m and j. These complicated trends make the PINN approach less consistent than traditional numerical methods. Note that automatic but empirical tuning of proposed by did not improve the results in our case, particularly with the L-LAAF, which is shown in Fig. S2. Because finding the optimal values for is not our primary purpose, we use the value of 10 for all three weight parameters for the following analysis.
Figure 7
Homogeneous soil. The effects of the weight parameters in the loss function (Eq. ) for the initial condition , the lower-boundary condition , and the upper-boundary condition on the loss terms , , , and for the initial and boundary conditions and the residual of the PDE, respectively, and the relative squared error in terms of volumetric water content . was fixed to 1.
[Figure omitted. See PDF]
3.1.6 Effects of number of residual and boundary data pointsThe effects of the number of residual points , where the residual of the RRE is evaluated, were investigated by varying the number , which resulted in 50 runs in total. NN architecture was fixed to five layers with 50 units with the L-LAAF. As expected, larger error was observed when a smaller number of residual points was used (see Fig. a). However, even if we increased the number to 30 000 and 100 000, the performance of PINNs did not improve. We concluded that this was due to the simultaneous but opposite effect of the number on the loss term , as shown in Fig. b. This was because increasing is equivalent to minimizing the importance of each residual point randomly selected in the spatial and temporal domain. Note that we tested the residual-based adaptive refinement algorithm proposed by , where additional residual points are added while training NNs based on the distribution of the residual values. As shown in Fig. S1a, the algorithm seemed to improve the performance of PINNs, but the effectiveness was minor. These findings demonstrate the difficulty in finding an optimal strategy to distribute residual points for PINNs to learn solutions to PDEs with high accuracy.
Figure 8
Homogeneous soil. (a) The effect of the number of residual points on the relative squared error . (b) The effect on the loss term for the residual .
[Figure omitted. See PDF]
Figure 9
Homogeneous soil. (a) The effect of the number of upper-boundary data on the relative squared error . (b) The effect on the loss term for the upper-boundary condition .
[Figure omitted. See PDF]
Also, the effects of the number of upper-boundary data points were studied, where . NN architecture and training algorithms were set to the same as Sect. . Figure a showed that more than 300 upper-boundary data points are necessary for PINNs to learn the solution well. This is because PINNs required enough upper-boundary data points to capture the surface flux in particular near the initial condition. However, increasing from 300 did not improve the performance of PINNs. At the same time, we observed the increase in the loss term for the upper-boundary condition , as shown in Fig. b. This observation is similar to the case for the residual points and makes it difficult to determine optimal . The difficulty in tuning parameters for PINNs is further explained in the next section.
3.1.7 Toward more consistent performance of PINNsWe demonstrated that PINNs can approximate the solution to the RRE with accuracy comparable to the FDM. However, a significant amount of effort was needed to tune the NN architecture and parameters, and those optimal settings depend on problems of interest, which makes it very challenging for PINNs to be consistent numerical solvers of PDEs. To understand why the performance of PINNs is not consistent, we investigated the relationships between and by compiling the results from the previous sections. The left column of Fig. corresponds to a fixed NN (five hidden layers with 50 units with the L-LAAF), and the center column is for NNs with varying architecture (the number of hidden layers and units) with the L-LAAF (from Sect. ), while the right column is for a fixed NN with varying weight parameters in the loss function (from Sect. ). Note that the dimension of the loss function (i.e., the number of adjustable parameters) is the same for the first and third columns because the NN architecture is the same, while the shape or landscape of the loss function is different because of the varying weight parameters .
As for the first column, the PINN solution was consistent regardless of the random seeds used for the NN initialization. Although there were some differences in the accuracy, a detailed examination of the PINN solutions revealed that the errors mainly came from near the upper boundary. Thus, we concluded that we obtained consistent PINN solutions for this problem once we determined the NN architecture. However, determining NN architecture is still empirical and depends on problems. Based on other studies and our experiences, NNs consisting of fewer than 10 hidden layers appear to be enough to approximate the solution to PDEs. However, the application of PINNs to PDEs with large spatial and temporal domains requires more investigation.
Figure k and l demonstrate that the residual loss is well correlated with , which coincides with the theoretical study by . This observation might indicate that smaller residual loss is more important than other loss terms. If this speculation is true, this implies the possibility of transfer learning, where NNs are pretrained with only the residual of PDEs without initial and boundary conditions and later fine-tuned by them, which could drastically reduce the computational work and needs more investigation.
Figure 10
Homogeneous soil. The relationships between loss terms and the relative squared error in terms of volumetric water content for a fixed neural network (NN) with different NN initializations (left column), NNs with varying architecture (center column), and a fixed NN with varying weight parameters in the loss function (right column). , , , and are the loss terms for the initial condition, lower- and upper-boundary conditions, and the residual of the PDE, respectively.
[Figure omitted. See PDF]
3.2 Heterogeneous soilIn this section, we simulated a one-dimensional infiltration into a two-layered soil with a length of 20 cm. Because each layer has a distinct saturated hydraulic conductivity , the solution to the RRE is not differentiable at the boundary of the layers. Thus, we implemented the domain decomposition method (see Sect. ) by dividing the spatial domain into the upper domain ( cm) and the lower domain ( cm). NNs and were assigned to and , respectively. The two NNs interact with each other through the interface conditions described in Sect. and were trained simultaneously, although separate training is also possible, as in .
We compared the PINN solution with an FEM solution obtained by HYDRUS-1D . FEMs are similar to PINNs in that both methods use some basis functions to approximate the solution of PDEs. While PINNs use NNs as the basis function, the FEM implemented in HYDRUS-1D uses a linear finite element as the basis function. Although HYDRUS-1D implements a variable time step, we used a constant time for the comparison. Because WRCs and HCFs defined by Eqs. () and () are not implemented in HYDRUS-1D, we used a lookup table feature to provide HUDRUS-1D with the WRC and HCF manually.
3.2.1 Problem setup
WRC and HCF relationships for the soils are the same for the homogeneous case. The saturated conductivity is 10.0 and 1.0 cm s for the upper layer (from to cm) and the lower layer (from to cm), respectively. Other parameters, , , and , for the two layers as well as the initial and boundary conditions are the same as the homogeneous case.
Figure 11
Heterogeneous soil. The saturated conductivity is 10.0 and 1.0 cm s for the upper and lower layer, respectively. (a) Physics-informed neural network (PINN) solution in terms of volumetric water content [–] obtained by two neural networks of five hidden layers with 50 units with the layer-wise adaptive activation function (left column). True analytical solution (center column) is given by (see Sect. ), and the difference between the PINN and true solutions is shown in the right column. (b) Numerical solution by a finite element method was obtained with a spatial mesh of cm and a time step h using HYDRUS-1D .
[Figure omitted. See PDF]
3.2.2 Characteristics of PINN solutionFigure a shows the PINN solution with the analytical solution introduced in Sect. . Both NNs and consisted of five hidden layers with 50 units with the L-LAAF, and was set to 1. Randomly sampled 10 000 residual points and equally spaced 101 initial data points were used for both NNs. The upper- and lower-boundary conditions were given as the case for the homogeneous soil. To connect the two NNs, randomly sampled 1000 points were used for the three interface continuity conditions: the water flux (Eq. ), the residual (Eq. ), and the NN output (Eq. ). All the weight parameters in the loss function were set to 10, while for the lower layer was set to 1. Figure b showed the FEM solution obtained using HYDRUS-1D with cm and h, which is comparable to the temporal resolution of the upper-boundary data points given to the PINNs. The PINN solution was superior to the HYDRUS-1D solution ( for the PINN and for the FEM solution, respectively), while the HYDRUS-1D solution underestimated the volumetric water content in the upper layer near the boundary, which coincides with . This is because only the matric potential continuity is guaranteed in HYDRUS-1D, while both matric potential and water flux continuity conditions were imposed for the PINNs. Also, HYDRUS-1D consistently overestimated the volumetric water content at the wetting front in the lower layer, while consistent errors were not observed for the PINN solution. From this comparison, we concluded that PINNs with domain decomposition can approximate the solution of the RRE for a two-layered soil with discontinuous hydraulic conductivity.
3.2.3 Training PINNs
The left column of Fig. shows the evolution of the loss terms. All the loss terms for both layers remained higher than those for the homogeneous case (see Fig. ), which demonstrated the difficulty in training PINNs for the layered soil case. While the Adam algorithm resulted in a well-approximated solution for the homogeneous case, the L-BFGS-B algorithm played an important role for the heterogeneous case, particularly in reducing the loss term for the upper-boundary condition and the residual for the upper layer. Figure c illustrated that the three interface conditions were satisfied well.
The right column of Fig. shows how the adaptive parameters for the L-LAAF changed during the training. We expected for the lower layer to be similar to the homogeneous case because the solution for the lower layer is similar to the homogeneous case. However, Fig. b showed that for the lower layer was much smaller than that for the homogeneous case, while similar trends of for different hidden layers were observed (i.e., Layer 2 was the highest). for all the layers of both layers reached their limiting values after approximately 20 000 to 30 000 iterations of the Adam algorithm, which coincided with the homogeneous case and . This indicated that if we can find a better initial guess of for each layer, we may be able to speed up the training of PINNs, which requires further research.
Figure 12
Heterogeneous soil. (a) The evolution of the loss terms in the loss function (left column) and adaptive parameters for the layer-wise locally adaptive activation function (Eq. ) for each hidden layer (right column) during the Adam algorithm (100 000 iterations) and the following L-BFGS-B training for the neural network for the upper layer. Here, Layer 1 is next to the input layer. (b) The same for the lower layer. (c) The evolution of the loss terms for the interface conditions.
[Figure omitted. See PDF]
Figure demonstrated how PINNs learned the solution. At the initialization (Fig. a), there are discontinuities at the boundary, which is evident for h. The continuity conditions and the lower-boundary condition were quickly met. The PINNs started to capture the flow of soil moisture at the 20 000 iterations of the Adam algorithm (Fig. e), which coincided with when the adaptive parameters for the L-LAAF reached their limiting values (see the right column of Fig. ). Even at the end of the Adam algorithm, there are large errors in the PINN solution near the surface and wetting fronts in the lower layer (Fig. g). These errors were further minimized by the L-BFGS-B algorithm (Fig. h). This demonstrated that a second-order method such as the L-BFGS-B algorithm is necessary to train PINNs when the solution to PDEs is complicated.
Figure 13
Heterogeneous soil. The evolution of the PINN solution during the training. (a) Initialization of the PINNs. (b–f) 1000, 5000, 10 000, 20 000, and 40 000 iterations of the Adam algorithm. (g) The end of 100 000 iterations of the Adam algorithm. (h) The end of the L-BFGS-B algorithm.
[Figure omitted. See PDF]
3.2.4 Effects of number of interface points and weight parameters in loss functionWe investigated the effects of the number of interface points on PINN performance. The number varied from 100, 300, 1000, 3000, and 10 000, and 10 different random seeds were used for each setting. Figure a–c showed that the effects on the loss terms for the interface conditions were negligible. The relative squared error decreased with the number from 100 to 300, but the effect was negligible for larger (see Fig. d). Thus, we concluded that the effects were minor and used 1000 interface points in the following analysis.
Figure 14
Heterogeneous soil. The effects of the number of interface data points . (a) The loss term for the continuity of the residual . (b) The loss term for the continuity of the water flux . (c) The loss term for the continuity of the neural network output . (d) The relative squared error with respect to the volumetric water content .
[Figure omitted. See PDF]
We also investigated the effects of the weight parameters in the loss function. We fixed for the lower layer to be 1 and for the initial condition and the lower-boundary condition for the lower layer to be 10, while values for the upper layer and the interface conditions were varied from 1, 10, and 100 (i.e., for the different loss terms in the upper layer are the same). A total of 10 different initializations were conducted. Figure a shows PINNs were more likely trapped by a local minimum of the loss function when for the upper layer was smaller, indicated by the cloud of the data points. However, the best PINN solution appeared not to be affected by . Similar observations were made for the effects on the loss terms corresponding to the upper layer (see Fig. S4). Figure b illustrated the effects of for the interface conditions. When was larger, the PINNs produced worse solutions, and it was evident that PINNs suffered from a local minimum of the loss function. Similar conclusions were made from the effects on the loss terms for the upper layer (see Fig. S4). These observations led us to conclude that choosing the right weight parameters in the loss function is very important and challenging for the heterogeneous case to achieve accurate and consistent solutions to the RRE.
Figure 15
Heterogeneous soil. The effects of weight parameters in the loss function on the performance of PINNs with respect to the relative squared error of the volumetric water content . (a) Upper layer. (b) Interface conditions.
[Figure omitted. See PDF]
4 Inverse modelingIn this section, we demonstrate that inverse modeling can be easily implemented using PINNs. Here, we aim to estimate a surface water flux (i.e., upper-boundary condition) from near-surface moisture measurements in a layered soil. Surface water flux is the result of precipitation, evaporation, and surface runoff and thus essential information for land surface modeling and groundwater management. Although rainfall measurements can be used as surface water flux, the measurements are generally spatially scarce and noisy. Therefore, it is important to estimate surface water flux from near-surface soil moisture data. In this line of research, employed the analytical solution of the linearized RRE, and proposed a deterministic inverse algorithm to estimate surface water flux given past surface water flux. used a simple soil water balance equation to estimate rainfall. These studies assumed soil hydraulic properties are homogeneous . Here, we present an inverse framework based on PINNs to estimate surface water flux from near-surface soil moisture measurements in a two-layered soil, as an extension of our previous work .
4.1 Problem setup
We consider a one-dimensional soil moisture dynamics in a layered soil. The upper layer is a 10 cm depth of loam soil (0 to cm), and the lower layer is a 10 cm depth of sandy loam ( to cm). The WRCs and HCFs of the soils for are represented by the van Genuchten–Mualem model : where is the residual water content [L L], is the saturated water content [L L], [L] and [–] determine the shape of the WRC, is the saturated hydraulic conductivity [L T], is the tortuosity parameter [–], , and is the effective saturation, defined as
33 The parameters for the two soils were set in the following way: , , , , and for the loam soil (upper layer) and , , , , and for the sandy loam soil (lower layer). The lower boundary is a constant pressure head set to cm, and the upper-boundary condition is a variable surface water fluxes as follows: cm h from to h; 0.02 cm h from to h; and cm h from to h. The positive and negative values represent evaporation and infiltration, respectively. The initial condition was set to cm for all the depths. To generate synthetic data for the abovementioned scenario, we employed HYDRUS-1D to compute the numerical solution of the RRE with h and cm. The numerical solution by HYDRUS-1D is not necessarily accurate because it may contain undesirable numerical errors near the interface, as observed in Sect. 3.2, although we confirmed the global mass balance of the solution. We further added a Gaussian noise with a mean of 0 and a standard deviation of 0.005 to the numerical solution. We sampled the simulated noisy synthetic data at predetermined locations to mimic soil moisture measurements by in situ sensors. We tested three patterns of the measurement locations [cm]: , , and . The temporal resolution of the measurements was 0.1 h.
4.2 PINNs inverse solutionTo infer the surface water flux upper-boundary condition, we constructed PINNs with domain decomposition. The two NNs consisted of five hidden layers with 50 units, as in Sect. , and was used for the output of both NNs (Eq. ). Unlike the forward modeling, the initial and boundary data points were not used, and the sampled synthetic data points and randomly sampled collocations points were only used to train the NNs. Therefore, the loss function is the sum of the loss term for the measurement data , the residual of the RRE , and the three interface conditions (, , and ). As for the weight parameters for each loss term, for the measurement data for both layers and the residual loss for the upper layer, and for the interface conditions, while for the residual loss for the lower layer as a reference. We tested 10 different NN initializations for each measurement scheme, and thus a total of 30 simulations were conducted. Note that the surface water flux was estimated by evaluating Eq. () with the solution of the RRE by PINNs.
Figure showed the best-recovered solution and estimated surface upper-boundary condition for the three measurement schemes. As expected, more accurate recovered solutions were obtained from dense soil moisture measurements (see also Fig. S7). However, interestingly, NNs more likely were trapped by bad solutions when more measurement data were given (see Fig. S7). This means a large amount of data does not necessarily lead to a good performance of PINNs because large data make the training of PINNs more difficult. This is an important practical point and requires further investigation. As for the two measurement location scheme (Fig. a), the wetting front reached the top measurement point ( cm) at approximately h, which coincided with when the estimated surface water flux was reasonable. After that time, both the recovered solution and estimated surface water flux were quite reasonable. Similar trends were observed for the other two measurement schemes (Fig. b and c). This suggested that we need soil moisture measurement data closer to the surface ( cm) to capture the wetting front and the infiltration rate. Figure S8 in the Supplement showed the evolution of the loss terms corresponding to the measurement data, residual, and the interface conditions. Although the direct comparison between the forward and inverse modeling is not possible because the problem settings are different, we observed smaller residual loss terms for the inverse modeling. This observation and our experiences indicate that PINNs are more effective for the inverse problem than the forward modeling because data points inside the spatial and temporal domain are more informative than initial and boundary conditions for NNs to find the solution to PDEs.
Figure 16
Inverse modeling to estimate surface water flux from soil moisture measurements in a layered soil (upper layer– loam soil and lower layer – sandy loam soil). True solution generated by HYDRUS-1D and the recovered solution by PINNs (left column) and the true and estimated upper surface water flux boundary condition (right column) for different measurement locations [cm]. (a) . (b) . (c) .
[Figure omitted. See PDF]
We note that soil hydraulic parameters are known for both layers in this test, which is not the case for field applications. implemented PINNs with a global optimization algorithm to estimate the van Genuchten parameters of a homogeneous soil (, , and ) from soil moisture measurements, and the framework was tested against for both synthetic and laboratory infiltration experiment data. They demonstrated that PINNs with a global optimization algorithm could determine the van Genuchten parameters for a homogeneous soil. In fact, the current study was motivated by the need to verify a PINN approach to estimate such soil hydraulic parameters of a layered soil for field applications. Our next research objective is to implement PINNs that can estimate both the upper surface boundary condition and soil hydraulic parameters (e.g., van Genuchten parameters) for layered soils and test them with soil moisture and surface water flux data measured in a lysimeter. Note that the estimation of surface water flux was reasonable when the value is positive (i.e., evaporation) in the example, but it would probably not be the case for field applications because evaporation requires coupled heat and water transport models. Applying PINNs to multi-physics in unsaturated hydrology is also our next research step.
5 Advantages and disadvantages of PINNsRegardless of the potential of PINNs to solve PDEs, several studies reported their failures and limitations . Although there are some theoretical studies on the convergence and error analysis of PINNs
One main drawback of PINNs for forward modeling is their computational time. For our case studies, it took approximately 30 and 90 min for the homogeneous and heterogeneous forward modeling using a desktop computer with GPU (NVIDIA GeForce RTX 2060), while it took less than 1 min for HYDRUS-1D to solve the heterogeneous problem. PINNs might be more competitive for large-scale hydrology problems, which needs further investigation.
For forward modeling, PINNs can treat initial and boundary conditions as data points. This feature is advantageous over traditional numerical methods because the accurate initial and boundary conditions are virtually impossible to obtain in practical conditions. On the other hand, traditional methods can take into account the uncertainties of the initial and boundary conditions through the Bayesian approach, while it requires solving the forward problem many times. Although uncertainty quantification through PINNs is open and challenging questions , the capability of PINNs to deal with noisy and incomplete initial and boundary conditions is noteworthy.
Traditional numerical methods require mesh generation, which can be tedious when the spatial domain is complicated. On the other hand, PINNs can be easily modified to accommodate such complicated geometries . However, it is challenging to correctly impose boundary conditions on PINNs while they are imposed softly in the loss function in this study. Regarding hydrology application, system-dependent boundary conditions such as ponding and evaporation conditions are challenging to implement because PINNs solve PDEs in spatial and temporal domains simultaneously, rather than sequentially, as in traditional time-stepping methods. This difficulty may be a technical issue, but the loss function would be more complicated with such system-dependent boundary conditions, and thus training NNs would be more difficult.
One of the main challenges of PINNs is training PINNs for large-scale modeling. In particular, a long-term simulation such as wetting and drying cycles requires PINNs to approximate very complicated functions. We may sequentially train PINNs in time but lose the ability of PINNs to solve PDEs simultaneously in space and time, and numerical and optimization errors accumulate with time stepping.
The application of PINNs to multi-scale and multi-physics problems is currently challenging, although there are some pioneering studies in hydrology . It is known that the solution of PINNs to a multi-scale problem is not always accurate, even for simple problems, because NNs tend to learn “easy” or low-frequency parts of the solution . Although the study was only concerned with water flow in unsaturated soils, near-surface soil moisture dynamics is essentially coupled heat and water transport. Therefore, further research is needed for the application of PINNs to multi-physics simulations in unsaturated soils.
One advantage of PINNs specific to the RRE is that there is no need for temporal discretization, which results in mass balance issue . Also, PINN solutions are differentiable and thus can be used to derive water flux easily without post-processing, as in . Furthermore, PINNs can store the solutions of PDEs efficiently with a smaller number of degrees of freedom, particularly for high dimensions . These merits can make PINNs a good candidate for a numerical solver of large-scale modeling based on the RRE. Nevertheless, these advantages depend on how accurately PINNs satisfy the RRE.
In terms of inverse modeling, PINNs have some interesting features. First, PINNs do not have to solve the forward modeling to solve the inverse problem. On the other hand, standard inverse methods require solving the forward modeling many times to adjust parameters of interest. This feature makes the computation of PINNs efficient. However, as shown in the study, PINNs do not precisely impose PDEs constraints as traditional methods, where the forward modeling is actually solved. Further research is needed to minimize the residual loss term so that known physics is precisely imposed. Second, when estimating boundary conditions as in the study or initial condition from data, PINNs do not require the discretization of those target functions. In traditional methods, it is common to represent the target functions as a linear combination of some basis functions (e.g., finite elements) and estimate the coefficients of the basis functions. In PINN framework, such discretization is not necessary, and those target function values can be evaluated directly from NNs. Overall, although PINNs have interesting and attractive characteristics, fully utilizing the potential requires further research. In the next section, future perspectives of PINNs are mentioned.
6 Conclusions and future perspectives
We presented a numerical method based on neural networks (NNs), called physics-informed neural networks (PINNs), to solve the Richardson–Richards equation (RRE) to simulate water flow in unsaturated homogeneous and heterogeneous soils. We tested recently proposed PINN algorithms on our problems and found that the layer-wise locally adaptive activation function (L-LAAF) developed by was effective. The L-LAAF changes the slope and the linear regime of the activation functions in NNs and helps PINNs approximate the solution of the RRE well. First, we tested the PINN approach for the homogeneous soil case. By comparing the PINN solution to the analytical solution by and the numerical solution by a finite difference method, we demonstrated that “well-trained” PINNs can be competitive in terms of accuracy and memory efficiency. However, training PINNs requires significant efforts to tune various parameters of NNs, including NN architecture and weight parameters in the loss function. We systematically investigated the effects of those parameters on the performance of PINNs and demonstrated that those interrelated effects make PINN approach less consistent. Although some automatic but empirical algorithms to tune those parameters improved the performance of PINNs to some extent, it was difficult for PINNs to consistently obtain solutions to the PDE with high accuracy, and the results were strongly dependent on the initialization of NNs. Our empirical but comprehensive observations provide some suggestions on the choice of the parameters, but we do not think they can be applied to various cases, and thus further studies are necessary.
We tested PINN approach for a layered soil, where hydraulic conductivity is discontinuous across the layer boundary. The analytical solution by was used to verify the PINN solution. We demonstrated that PINNs with domain decomposition proposed by successfully approximated the solution of the RRE for a two-layered soil. The comparison with a finite element method using popular software, HYDRUS-1D , was made. The PINN solution was superior to HYDRUS-1D for the problem because the interface conditions on the layer boundary were well imposed for the PINN approach but not for HYDRUS-1D. Nevertheless, the study demonstrated that obtaining PINN solutions for the problem with consistent accuracy was challenging because of the difficulty in choosing the right weight parameters in the loss function, which determines the relative importance of physical constraints for the problem (e.g., initial and boundary conditions).
We further applied the PINNs with domain decomposition to the inverse modeling to estimate a water flux upper-boundary condition from noisy sparse soil moisture measurements. The inverse modeling was easily formulated by the PINN approach, and the effects of the measurement schemes were studied. The upper-boundary condition was reasonably inverted from the noisy data, in particular when measurement data near the soil surface were available. However, our results demonstrated that a large amount of data do not necessarily lead to a good performance of PINNs because training PINNs is more difficult with more data. Further research is needed to make PINNs learn from a larger amount of data and simultaneously determine both soil hydraulic properties and surface water flux for layered soils.
The PINN algorithm presented here is focused on water flow in unsaturated soils. However, there are many situations where we need to simulate saturated–unsaturated flow, such as ponding conditions and interactions with groundwater. Our preliminary tests gave positive results on the extension of PINNs to saturated–unsaturated cases. Readers may refer to the author comment in the interactive public discussion .
PINNs have the potential to solve issues traditional numerical methods cannot solve by leveraging the capability of NNs to approximate complex functions efficiently. However, the mathematical complexities of the forward and inverse problems are lumped into a complicated nonlinear, non-convex minimization problem. Whether PINNs can perform well depends on whether we can solve the resulting minimization problem well. Another difficulty comes from the fact that PINNs have an unusual regularization term in the loss function as the form of the residual of PDEs. This term is very different from standard regularization terms such as L1 and L2 regularizations because it contains the derivative of the output of NNs with respect to their input. The mathematical and exploratory investigation of the minimization problem and the regularization term is necessary for further improvements of PINNs. The investigation may include a vast amount of literature on NNs and PDE constrained optimization. There are many methods and findings that have not been well tested against PINNs, including transfer learning, second-order optimization methods, and the correspondence with adjoint-state methods . We will investigate those areas to improve the understanding of PINNs and use PINNs for large-scale modeling in hydrology.
Aside from PINNs, the latest research trends have been directed toward learning the “operator” of PDEs rather than their solutions given initial and boundary conditions, as in the study. This new research field has been led by two main groups , and they aim to develop operator learning methods applicable to general PDEs, of course including the ones in hydrology. Do we wait until they develop general PDE simulators or provide a unique perspective in soil physics and hydrology? We need to consider how we contribute to the rapid progress of the fields as domain scientists .
Appendix A List of abbreviations
| FDMs | Finite difference method |
| FEMs | Finite element methods |
| GPUs | Graphics processing units |
| HCF | Hydraulic conductivity function |
| L-LAAF | Layer-wise locally adaptive activation |
| function | |
| ML | Machine learning |
| NNs | Neural networks |
| PDE | Partial differential equation |
| PINNs | Physics-informed neural networks |
| RRE | Richardson–Richards equation |
| WRC | Water retention curve |
Appendix B List of notation
| Superscript | |
| prediction except for used for | |
| dimensionless for the analytical solutions | |
| Subscript | |
| Dirichlet boundary condition | |
| water flux boundary condition | |
| interface | |
| interface condition regarding the continuity | |
| of neural network output | |
| interface condition regarding the continuity | |
| of water flux | |
| interface condition regarding the continuity | |
| of the residual of the RRE | |
| interface condition regarding the continuity | |
| of water potential | |
| initial condition | |
| lower-boundary condition | |
| lower layer | |
| measurement data | |
| residual | |
| upper-boundary condition | |
| upper layer | |
| Alphabet | |
| the collection of trainable parameters for | |
| adaptive activation functions for a neural | |
| network | |
| trainable parameter for the element-wise | |
| nonlinear activation function | |
| the collection of bias vectors for a neural | |
| network | |
| bias vector for the th hidden layer | |
| Alphabet | |
| time step for finite difference and finite | |
| element solutions [T] | |
| spatial mesh for finite difference and finite | |
| element solutions [L] | |
| initial condition | |
| Dirichlet boundary condition | |
| total water head [L] | |
| vector for the th hidden layer | |
| water flux boundary condition | |
| hydraulic conductivity [L T] | |
| saturated hydraulic conductivity [L T] | |
| tortuosity parameter [–] | |
| the number of hidden layers | |
| loss function | |
| loss term for constraints | |
| van Genuchten parameter | |
| dimension of a vector corresponding to the | |
| th hidden layer | |
| van Genuchten parameter [–] | |
| dimension of input vector | |
| dimension of output vector | |
| the number of points for constraints | |
| neural network functions | |
| output functions | |
| water flux in the vertical direction (positive | |
| downward) [L T] | |
| water flux in three dimensions [L T] | |
| constant water flux at the surface to determine | |
| the initial condition of the analytical solutions | |
| [L T] | |
| constant water flux at the surface used in the | |
| analytical solutions [L T] | |
| the residual of the RRE | |
| fixed scaling factor for adaptive activation | |
| functions | |
| source term [T] | |
| time [T] | |
| final time [T] | |
| the collection of the weight matrices for a | |
| neural network | |
| weight matrix for the th hidden layer | |
| input vector | |
| output vector | |
| vertical coordinate or elevation head (positive | |
| upward) [L] | |
| the vertical length of a soil [L] | |
| Greek alphabet | |
| pore-size distribution parameter [L] | |
| van Genuchten parameter [L] | |
| fixed parameter for the output of neural | |
| networks | |
| relative squared error in terms of volumetric | |
| water content | |
| volumetric water content [L L] | |
| Greek alphabet | |
| residual volumetric water content [L L] | |
| saturated volumetric water content [L L] | |
| neural network parameters | |
| update of neural network parameters for each | |
| iteration of optimization algorithms | |
| infinite sequence to compute the analytical | |
| solution | |
| weight parameters in the loss function | |
| element-wise nonlinear activation function | |
| water potential in soils [L] | |
| water potential at the bottom boundary [L] | |
| spatial domain | |
| spatial boundary | |
| Others | |
| equal by definition | |
| nabla | |
Code and data availability
All the Python codes and data to reproduce the results are available at 10.5281/zenodo.6030635 .
The supplement related to this article is available online at:
Author contributions
TB contributed to the conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, original draft preparation, and review and editing. TAG contributed to the conceptualization, funding acquisition, methodology, supervision, visualization, and review and editing.
Competing interests
The contact author has declared that neither of the authors has any competing interests.
Disclaimer
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements
The publicly available code of physics-informed neural networks provided by Sifan Wang, Yujun Teng, and Paris Perdikaris (University of Pennsylvania) was instrumental in the development of our model. We are indebted to the editor, Nunzio Romano, Silvio Gumiere, and the two anonymous reviewers for improving this paper.
Review statement
This paper was edited by Nunzio Romano and reviewed by Silvio Gumiere and two anonymous referees.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Modeling water flow in unsaturated soils is vital for describing various hydrological and ecological phenomena. Soil water dynamics is described by well-established physical laws (Richardson–Richards equation – RRE). Solving the RRE is difficult due to the inherent nonlinearity of the processes, and various numerical methods have been proposed to solve the issue. However, applying the methods to practical situations is very challenging because they require well-defined initial and boundary conditions. Recent advances in machine learning and the growing availability of soil moisture data provide new opportunities for addressing the lingering challenges. Specifically, physics-informed machine learning allows both the known physics and data-driven modeling to be taken advantage of. Here, we present a physics-informed neural network (PINN) method that approximates the solution to the RRE using neural networks while concurrently matching available soil moisture data. Although the ability of PINNs to solve partial differential equations, including the RRE, has been demonstrated previously, its potential applications and limitations are not fully known. This study conducted a comprehensive analysis of PINNs and carefully tested the accuracy of the solutions by comparing them with analytical solutions and accepted traditional numerical solutions. We demonstrated that the solutions by PINNs with adaptive activation functions are comparable with those by traditional methods. Furthermore, while a single neural network (NN) is adequate to represent a homogeneous soil, we showed that soil moisture dynamics in layered soils with discontinuous hydraulic conductivities are correctly simulated by PINNs with domain decomposition (using separate NNs for each unique layer). A key advantage of PINNs is the absence of the strict requirement for precisely prescribed initial and boundary conditions. In addition, unlike traditional numerical methods, PINNs provide an inverse solution without repeatedly solving the forward problem. We demonstrated the application of these advantages by successfully simulating infiltration and redistribution constrained by sparse soil moisture measurements. As a free by-product, we gain knowledge of the water flux over the entire flow domain, including the unspecified upper and bottom boundary conditions. Nevertheless, there remain challenges that require further development. Chiefly, PINNs are sensitive to the initialization of NNs and are significantly slower than traditional numerical methods.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Life and Environmental Science Department, University of California, Merced, CA, USA





