1 Introduction
Inverse modeling is used across the Earth sciences, engineering, and medicine to estimate a quantity of interest when there are no direct observations of that quantity, but rather, there are only observations of related quantities
In many applications of inverse modeling, multiple data sources can be used as prior knowledge to help predict the unknown quantity. For example, in environmental inverse problems, there are increasing numbers of satellite sensors that provide detailed data on both the built and natural environment and can serve as predictors of pollution emissions, or there may be multiple models that provide different predictions of the unknown quantity. However, the increasing availability of prior information or predictor variables for inverse modeling can be both a blessing and a curse. On one hand, the existence of more predictor variables can help increase the accuracy of the posterior estimate. On the other hand, the existence of so many predictor variables or prior information can necessitate difficult choices about which to include or exclude from the inverse model.
Throughout this paper, we draw upon case studies from atmospheric inverse modeling (AIM) to highlight the challenges of handling multiple predictor variables. In AIM, the primary goal is to estimate surface fluxes of a greenhouse gas or air pollutant using observations of gas mixing ratios in the atmosphere collected from airplanes, TV towers, or satellites. A model of atmospheric winds is usually required to quantitatively link surface fluxes with downwind observations in the atmosphere
One solution to this challenge is to choose a single predictor variable or a single biogeochemical model to construct the prior of the inverse model, as is common in the existing inverse modeling literature
1 where is a vector representing the unknown quantity of interest (e.g., spatial or spatiotemporal maps of emissions of pollution), indicates the total number of unknowns to estimate (e.g., at different locations and at different times), and is referred to here as the stochastic component. It captures variability in the unknown quantity that is not described by the predictor variables. In this setup, the stochastic component is estimated as part of the inverse model along with the set of coefficients or scaling factors that describe the relationships between the predictor variables and the quantity of interest (). We use the formulation in Eq. () throughout this paper because it affords more flexibility to incorporate a greater number of predictor variables.
This framework facilitates the use of more than one predictor variable, but the inclusion of all possible predictors within an inverse model has numerous pitfalls. First, many of the biogeochemical models or environmental variables described above are colinear, meaning that they are highly correlated with one another. Colinearity can cause numerous problems in linear modeling because the model has difficulty determining how to weight very similar or non-unique predictor variables
Model selection techniques have been considered for obtaining important and relevant predictors in inverse problems. These techniques include the partial test and Bayesian information criterion (BIC), among other model selection methods (see Sect. ; e.g., ). However, these approaches can be computationally expensive, especially for large-scale problems with many candidate predictors.
Summary of challenges and contributions
Making an informed decision about which predictor variables or prior information to incorporate or discard in the inverse model is important yet very challenging, as it is often not straightforward to distinguish between informative and non-informative variables. In addition, choosing a subset of predictor variables out of possible available variables can involve comparisons, which is computationally challenging even for modest values of and .
In this study, we develop a new approach for incorporating prior information in an inverse model using predictor variables, while simultaneously selecting the relevant predictor variables for the estimation of the unknown quantity of interest. Following a Bayesian approach, we focus on efficiently computing the maximum a posteriori (MAP) estimate of the posterior distribution for the unknown quantity of interest as well as the predictor coefficients (). Note that this algorithm can also be applied to spatial interpolation problems where there are multiple predictor variables (i.e., universal kriging), and we describe the implementation for kriging problems where applicable throughout the paper. Overall, this approach has three main contributions.
-
We develop a more comprehensive statistical model, a joint model for the unknown quantity of interest () and the predictor coefficients (), with sparsity-promoting priors on the coefficients , that enables model selection.
-
We adapt an iterative algorithm developed by to simultaneously estimate both and . This algorithm requires only a “single step” to select a set of predictors and solve the inverse problem (i.e., compute estimates for and ). The proposed algorithm also automatically selects the regularization parameters on the fly.
-
We evaluate this algorithm using several examples from AIM, including examples drawn from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite, and compare against existing model selection techniques on small and moderate-sized problems. For the largest problem we consider, existing model selection techniques cannot be run in a reasonable amount of time.
Details on notation are provided in the Appendix. Unless otherwise specified, vectors are denoted with boldfaced italic lowercase letters (e.g., ), matrices are denoted with boldfaced capital letters (e.g., ), subscripts are used to index columns of matrices or iteration count, and denotes the transpose operation.
2 Existing strategies for model selection in inverse problemsThere are different strategies to decide which predictor variables or prior information to include within this inverse (or kriging) model, operating under the assumptions described in Eqs. () and (). The first approach is to choose predictor variables or prior information based on expert judgment. For example, perhaps a modeler trusts one biogeochemical CO2 or CH4 model more than others – based on either previous analysis or existing literature. A downside of this approach is that it often necessitates making subjective decisions that are not necessarily based on the available atmospheric CO2 or CH4 observations.
The second approach used in several inverse studies is model selection. This class of methods is also frequently used in regression analysis and linear modeling
2 where is a vector of observations so that the variable indicates the total number of available observations. Here, represents a physical model that relates the unknown quantity to the observations, and represents noise or errors, including errors in the observations and in the physical model . In the present study, as is the prevalent approach, we model this error as normally distributed with zero mean and covariance matrix . Note that for the kriging case, contains a single value of in each row; this entry links the observation associated with that row to the column that corresponds to the matching location in the unknown space. All remaining elements of are set to . Given these relationships, the goal of model selection is to find a set of predictor variables (, Eq. ) that best improves the fit of against available observations . Specifically, model selection will typically reward combinations of predictor variables that are a better fit against available observations and penalize models for increasing complexity (i.e., for increasing numbers of predictor variables).
Existing model selection methods usually determine this fit using the weighted sum of square residuals (WSS)
Common model selection methods include the partial test (also called the variance ratio test), the Akaike information criterion (AIC), and the Bayesian information criterion (BIC). The partial test can only be used to compare two candidate models at a time: a model with a smaller number of predictor variables, with , and a model with a larger number of predictor variables, with
The AIC and BIC, by contrast, operate on a different principle . A modeler will often calculate an AIC or BIC score for every possible combination of predictor variables : where is a subset of , refers to the natural logarithm (with base ), and denotes the determinant of the matrix . The combination of predictor variables with the lowest AIC or BIC score is deemed the best model. Note that these two approaches are conceptually similar but have different penalty terms for model complexity: the penalty in the BIC depends on the number of observations (), while the AIC penalty does not.
An upside of statistical model selection is objectivity; this approach can be used to choose predictor variables that are best able to reproduce the observations without over-fitting those observations. There are several downsides to this approach. First, it requires multiple steps – the user needs to run model selection to identify the predictor variables and then subsequently solve the inverse problem to estimate the unknowns . Second, there are often computational compromises required to implement model selection. If there are possible predictor variables, there are different combinations to evaluate that range in size from to total predictor variables. One possible way to reduce the size of this search space is to implement the partial test using forward, backward, or stepwise selection
Third, existing approaches to model selection may not work at all for very large inverse problems due to computational limitations. For example, existing inverse modeling and kriging studies that implement model selection do so by calculating WSS for different combinations of predictor variables as defined in Eq. (). This equation requires formulating and inverting , a task that is not computationally feasible for many large inverse problems or for problems where is not explicitly available as a matrix but rather where only the outputs of forward and adjoint models are available. More recently, a handful of studies have replaced with an approximate diagonal matrix . This compromise makes it possible to estimate WSS, but a downside is that this approximation for does not match the actual values of and that are used in solving the inverse problem.
3 Proposed approach: msHyBRIn this study, we develop a new approach, msHyBR, for incorporating prior information or predictor variables into an inverse model. This approach directly addresses several of the challenges described above. First, it is computationally efficient, even for very large inverse problems with many observations and/or unknowns where it is not possible to compute (or more appropriately, solve linear systems with ) and for problems where an explicit matrix is not available. Second, the computational cost of this approach does not scale exponentially with the number of predictor variables (in contrast with other methods that scale with the number of combinations, i.e., ). Third, the proposed approach will determine a set of predictor variables and solve for the unknown quantity in a single step, as opposed to the two-step process required for existing model selection algorithms. Overall, the proposed approach opens the door for assimilating large amounts of prior information or numbers of predictor variables within an inverse model. We argue that this capability is important, particularly as the number of environmental, Earth science, and remote sensing datasets continues to grow.
3.1 Model structure and assumptions
A key aspect of this work is proposing a more comprehensive statistical modeling of the problem defined by Eqs. () and (). Recall that is the unknown quantity of interest in the inverse problem in Eq. (), and several predictor variables are used in the estimation of this quantity, as stated in Eq. (). In this work, we model the coefficients of the predictors () as random instead of deterministic variables, a contrast to many existing inverse modeling studies
7 where the predictor coefficients are components of the vector and follow a Laplace (also known as double exponential) distribution that promotes sparsity. The parameter controls the shape of that distribution. Note that we also include a regularization parameter () that scales the covariance matrix () such that the overall inverse model is consistent with the actual model–observation residuals. Finally, note that the overall model structure with this regularization parameter can be reformulated more compactly as follows: 8 Assuming Eqs. () and () and according to Bayes' theorem, the density function of the joint posterior probability of and is 9 Here, all terms in the exponent are vector norms, and, in particular, for any symmetric positive-definite (SPD) matrix . Further, the symbol denotes proportionality. Note that the joint posterior probability for and is not Gaussian; however, the MAP estimate can be computed by solving the following optimization problem, 10 Other Bayesian models can be used for sparsity promotion; see, e.g., . Our approach is analogous to the use of norms in regression and can be seen as an extension of the Bayesian LASSO to large-scale inverse modeling. Moreover, potential limitations of the Laplace prior have been studied, and other Bayesian models can be used for sparsity promotion; see, e.g., , , and . Changing the modeling assumptions may lead to different model selection techniques.
The remainder of this section is dedicated to describing a computationally efficient algorithm called msHyBR to solve Eq. (), i.e., to estimate and simultaneously. msHyBR is an iterative procedure that takes as input both the problem inputs and the set of predictor variables in . At each iteration, a projected problem is solved and the solution subspace is expanded until some stopping criteria are satisfied. Reconstructed estimates of and are the outputs of the algorithm, and subsequent thresholding of can be done, e.g., to identify important predictor variables. A general overview of the approach is provided in Fig. .
Figure 1
General approach for simultaneous model selection and inversion. Given both problem inputs and predictor variables, msHyBR is an iterative procedure that solves a projected problem (with automatic estimation of parameters and ) and expands the solution subspace until some stopping criteria are satisfied. Reconstructed estimates of and can be used for further analysis, i.e., to identify important predictor variables , where is the set of selected indices.
[Figure omitted. See PDF]
3.2 Methodology and algorithmTo handle the computational burden of computing the inverse or square root of the covariance matrix (), we begin by transforming the inverse problem in Eq. (), following and . This transformation is crucial for many high-dimensional problems where can only be accessed through matrix–vector products. Let
11 so that the solution of the problem defined in Eq. () can subsequently be obtained by solving the following optimization problem: 12 Solving such optimization problems can be challenging, and Krylov subspace methods, which are iterative approaches based on Krylov subspace projections, are ideal for working in subspaces for large-scale problems. For problems where it is assumed that the predictor coefficients follow a Gaussian distribution, generalized hybrid projection methods are described in . However, solving Eq. () is more difficult due to the regularizer, which is not differentiable at the origin. Several techniques have been devised to approximate the solution of similar inverse problems. One approach is to use nonlinear optimization methods, particularly iterative shrinkage algorithms such as FISTA or separable approximations such as SPARSA .
Alternately, one can use a majorization–minimization (MM) approach, which involves successively minimizing a sequence of quadratic tangent majorants of the original functional centered at each approximation of the solution. For example, methods based on iterative schemes that approximate the -norm regularization term by a sequence of weighted terms, and in combination with Krylov methods, are usually referred to as iterative re-weighted norm (IRN) schemes . In this paper, we build on this strategy. In particular, for Eq. (), and given an initial guess , we can solve a sequence of re-weighted least-squares problems of the form 13 where is an invertible diagonal matrix constructed such that and with a positive constant independent of . Here, corresponds to components of . Then, each step of the MM approach consists of solving Eq. (). For high-dimensional problems, it is unfeasible to solve Eq. () directly, but an iterative method such as the generalized hybrid method described in could be used. However, this strategy has two main disadvantages: (1) using an iterative method yields an inner–outer optimization scheme, which can be very computationally expensive, and (2) the regularization parameters and in Eq. () cannot be selected independently since previous algorithms require assuming that for some known (even if can be computed automatically).
To overcome these shortcomings, we use flexible Krylov methods, which use iteration-dependent preconditioning to build a suitable basis for the solution and have been shown to be very competitive to solve problems involving -norm regularization in other contexts
Note that since we are considering a joint model, we need to build a solution space for both the variable , which is related to the unknown quantity of interest () through the efficient change of variables defined in Eq. (), and the predictor coefficients (). Leveraging flexible Krylov methods in a similar fashion to , we use the flexible generalized Golub–Kahan (FGGK) process to generate a basis for the solution, which is augmented with a new basis vector at each iteration.
3.2.1 Flexible generalized Golub–Kahan (FGGK) iterative processThe FGGK process is an iterative procedure that constructs a basis for and where each basis vector is stored as columns of . First, the initialization step consists of computing , , , and . At each iteration , the FGGK process generates vectors , and by updating the following relation: where and are normalization scalars. See Algorithm . Generally, this process involves constructing new direction vectors and orthonormalizing them using appropriate inner products. In particular, is constructed considering , orthogonalizing against the previous basis vectors using the inner product defined by , and normalizing this using the corresponding norm induced by this inner product. The analogous process is used to construct , where is orthonormalized with respect to the previous vectors using the inner product defined by .
Equivalently, we can consider the matrices and so that, by construction,
16 in exact arithmetic. Moreover, one can define the following augmented matrices: 17 so that Eqs. () and () can be expressed more compactly as 18 Here, is upper Hessenberg with elements for and , is upper triangular with elements , and the solution space for is spanned by the columns of , and it is defined as 19
3.2.2 Computation of the solutionAfter a new basis vector is included in the solution space, one needs to (partially) solve a subproblem as defined in Eq. (). That is, at each iteration, we solve a small projected least-squares problem to approximate the solution of Eq. () in a space of increasing dimension. In particular, using the relations in Eqs. () and () obtained using the FGGK process, at each iteration we solve
20 for and or, equivalently, and , where 21 Finally, we need to undo the change of variables defined in Eq. () so that the solution of the original problem in Eq. () is approximated at each iteration by . Note that, to simplify the notation when deriving the model, we assumed that the regularization parameters and are fixed, but these can be automatically updated at each iteration so that, effectively, and in Eq. (). This procedure will be explained in the following section. The pseudo-code for the new model selection HyBR method (msHyBR) can be found in Algorithm .
Note that Eq. () is a standard least-squares problem with two Tikhonov regularization terms, and the coefficient matrix is of size , so the solution can be computed efficiently . Efficient QR updates for are considered in .
Each iteration of msHyBR requires one matrix–vector multiplication with and its adjoint (let denote the cost one matrix–vector product with or its adjoint), two matrix–vector multiplications with and one with its adjoint (similarly, denoted as ), two matrix–vector products with (denoted as ), one matrix–vector product with (denoted as ), one matrix–vector product with (denoted as ), and the inversion of a diagonal matrix that is floating point operations (flops) and an additional flops for the summation calculation in Eqs. () and (). To compute the solution of the projected problem (), the cost is flops, since is upper Hessenberg. And the cost of forming and to obtain is . Since and , and . The overall cost of the msHyBR algorithm is 22 It is important to note that the projected problem in Eq. () for msHyBR is much cheaper to solve than Eq. () within each MM iteration due to its optimization over a lower-dimensional space.
Algorithm 1Hybrid method for model selection (msHyBR).
- Require:
Matrix , positive-definite matrices and , vector . Invertible matrix .
-
Initialize , where and .
-
while stopping criteria are not satisfied do
-
, for
-
, ,
-
, , , where .
-
, for
-
, ,
-
Update QR factorization to obtain
-
Solve Eq. () to get with selected regularization parameters .
-
-
end while
-
return Approximations and
One of the benefits of the proposed method is that it conveniently allows us to automatically estimate and in Eq. () throughout the iterations. This feature is common in hybrid projection methods for a single parameter and more recently has been extended to several parameters
To validate the potential of the method independently of the regularization parameter choice criteria, we first consider the optimal regularization parameters with respect to the residual norm – that is,
23 Note that this is of course unfeasible for real problems where the solution is unknown. In that case, one can use many different regularization parameter choice criteria; see, e.g., , , , and . In this paper, we focus on the discrepancy principle (DP), since it is an established criterion. This consists of choosing and such that 24 where is a predetermined parameter.
Moreover, note that other regularization parameter choices can be used seamlessly using an analogous approach; for example, a few standard choices are described in for the two variable cases.
4 Numerical experimentsIn this section, we present three examples to evaluate the proposed approach msHyBR for model selection in inverse problems. First, we present a small one-dimensional signal deblurring example to compare the proposed method with existing model selection approaches and other inverse methods that assume different priors. Second, we develop a hypothetical AIM example featuring a real forward model but synthetic data where the mean and subsequently the predictor variables are constructed using a zonation model. Third, we present a case study on estimating CO2 sources and sinks (i.e., CO2 fluxes) across North America using a year of synthetic CO2 observations from NASA's OCO-2 satellite. In this final case study, we use the proposed algorithm to predict synthetic CO2 fluxes using numerous meteorological and environmental predictor variables.
We discuss three different methods for model selection.
-
Bayesian approaches. This includes the proposed msHyBR approach, which performs model selection in a one-step manner.
-
The genHyBR approach, proposed in , corresponds to the known mean case (which we assume to be zero).
-
The genHyBRmean approach, proposed in
Algorithm B1 , estimates the mean coefficients using a Gaussian prior – that is, no sparsity is imposed. This method requires estimating two parameters and , but it requires taking , where is a fixed parameter for a specific application.
-
-
Exhaustive selection. We use the AIC and BIC criteria with exhaustive search (denoted as “exh” in the results).
-
Forward selection. Since the variance ratio test is based on a pairwise comparison, an exhaustive search would require implementing the test times, which is infeasible to calculate for large (e.g., for , we require evaluations). Therefore, we perform the variance ratio test with a forward selection method. For comparison, we also include AIC and BIC with forward selection (denoted as “fwd” in the results).
4.1 One-dimensional deblurring example
The first case study discussed here is a simple 1-D inverse modeling example where the solution is a combination of several polynomial functions. We use this hypothetical case study to examine the basic behavior of different model selection algorithms, including the proposed msHyBR algorithm and exhaustive combinatorial search for the optimal solution. We apply these algorithms to more complicated and challenging problems in subsequent case studies.
More specifically, this example concerns an application in 1-D signal deblurring, involving a Gaussian blur with blurring parameter . The elements in the matrix representing the forward model are defined as We assume that the covariates involve seven predictor variables corresponding to the discretized representations of the first five Chebyshev polynomial basis functions of the first kind and two mirrored Heaviside functions with a jump at evaluated on a grid with equispaced points between and . The columns of are displayed in Fig. a. Moreover, the covariance matrix is constructed using a Matérn covariance kernel (see, e.g., ), with parameters and . For this example, the exact solution has been created using a realization of the model in Eq. (), with the parameter . The sparse true coefficients are displayed in Fig. . Last, the covariance of the additive noise is . The measurements , along with their noiseless counterpart, are displayed in Fig. b, while Fig. c shows the true solution as well as the contributions corresponding to the mean and the stochastic component. Note that for this specific noise realization, the noise level is .
Figure 2
For the one-dimensional deblurring example, predictor variables (or columns of ), measurements with and without noise, and the true solution are provided. The axis denotes the domain where the signal and measurements are defined.
[Figure omitted. See PDF]
We use the msHyBR approach to compute the reconstructions of and . Note that the estimated coefficients, , can be used for model selection by selecting the columns corresponding to the coefficients of whose absolute value is higher than a chosen threshold. Also note that for the two-stage methods based on model selection, the inverse problem is solved only using the subset of selected covariates defined in Sect. . This gives rise to a vector of coefficients , which is of smaller or equal dimension to , since it corresponds only to the coefficients associated with the previously selected columns. To compare the estimated and true coefficients, and as observed in Fig. , the vectors have been augmented with zeroes on the coefficients whose indices are not in the set of selected indices .
4.1.1 Comparison of the reconstructionsFor this example, the new method msHyBR is competitive in terms of the quality of the reconstructions of and the coefficients as displayed in Figs. and , respectively. The three panels correspond to Bayesian methods (a), exhaustive selection methods (b), and forward selection methods (c). For the comparison with Bayesian models, we observe from Fig. a that genHyBR produces an oscillatory solution, which can be attributed to the stochastic component (), since the mean in Eq. (). By contrast, the genHyBRmean reconstruction is smoother than the genHyBR reconstruction, which highlights the importance of including adept predictor variables. The genHyBRmean reconstruction is similar to but not as accurate as the proposed msHyBR algorithm, which is evident in the relative reconstruction error norms computed as , where is the true solution and is the reconstruction at the th iteration. These values are provided as a function of the iteration in Fig. . This improvement can be attributed to msHyBR's superior reconstruction of the coefficients in ; see Fig. a. Recall that genHyBRmean assumes a Gaussian distribution on , and this assumption has a smoothing effect on the computed , causing it to deviate from the true .
We observe that the msHyBR reconstruction of is similar to those corresponding to the exhaustive selection and the forward selection approaches. However, from the reconstructions of provided in Fig. b and c, we observe that msHyBR identifies appropriate weights for the fourth and sixth coefficients. Next, we compare the performance of the methods for model selection by including standard quantitative measures for the evaluation of binary classifiers.
Figure 3
Reconstructed quantity of interest () for the one-dimensional deblurring example with predictor variables described in Sect. compared to the true solution .
[Figure omitted. See PDF]
Figure 4
Reconstructed predictor coefficients () for the one-dimensional deblurring example with predictor variables described in Sect. compared to the true .
[Figure omitted. See PDF]
Figure 5
Relative reconstruction error norm history (in logarithmic scale) for the one-dimensional deblurring example with predictor variables described in Sect. .
[Figure omitted. See PDF]
4.1.2 Comparison of the model selectionWe show the different elements in the confusion matrix as well as the F1 score, defined as
25 where TP and FP are true positive and false positive, respectively, and FN is false negative. Note that F1 can take values between 0 and 1, with 1 corresponding to a perfect classification. Note that the score F1 in Eq. () corresponds to the harmonic mean of the positive predictive value or precision (fraction of the selected columns that are in the true set) and the sensitivity or recall (fraction of true set of relevant columns that is identified by the method). The results are provided in Table , where the best-performing algorithm for each of the categories is highlighted in boldface. For this example, msHyBR performs competitively. It is also interesting to note that, as expected theoretically, genHyBRmean tends to produce false positives due to the smoothing effect on the coefficients .
Table 1Confusion matrix for the one-dimensional deblurring example: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). The F1 score is defined in Eq. (). The best-performing methods for each category are marked in boldface.
TP | FP | TN | FN | F1 | |
---|---|---|---|---|---|
msHyBR | 3 | 0 | 4 | 0 | 1 |
exh BIC | 3 | 0 | 4 | 0 | 1 |
exh AIC | 3 | 0 | 4 | 0 | 1 |
fwd test | 3 | 1 | 3 | 0 | 0.86 |
fwd BIC | 3 | 1 | 3 | 0 | 0.86 |
fwd AIC | 3 | 1 | 3 | 0 | 0.86 |
genHyBRmean | 3 | 2 | 2 | 0 | 0.75 |
The aim of this example was to evaluate the proposed approach in comparison to existing model selection approaches for a small problem. Note that all the regularization parameters used to compute the solution of the inverse problem (regardless of whether this is done in one or two steps) are chosen to be optimal with respect to the relative reconstruction error norm. We consider more realistic case studies and parameter choice methods in the upcoming sections.
4.2 Hypothetical atmospheric inverse modeling (AIM) problemThis experiment concerns a synthetic inverse modeling problem where the true solution features distinct blocks of emissions in different regions of North America (i.e., a zonation model; see Fig. a). This case study is hypothetical, where the ground truth is known. The goal of this case study is to examine the performance of the new msHyBR method compared to a two-step process where the relevant predictor variables are chosen first (using standard model selection techniques) and then the solution of the inverse model is computed. The true emissions in this case study have relatively simple geographic patterns, much simpler than most real air pollutant or greenhouse gas emissions. With that said, this case study provides a clear-cut test for evaluating the algorithms developed here; model selection should identify predictor variables (i.e., columns of ) corresponding to the large emissions blocks in the true solution (Fig. ) and should not select predictor variables in other subregions with small or non-existent emissions. Overall, this example demonstrates that msHyBR can achieve similar reconstruction quality as a two-step approach and can also alleviate some of the computational challenges with model selection for larger problems.
In this experiment, we consider a synthetic atmospheric transport problem aimed at estimating the fluxes of an atmospheric tracer across North America, with a spatial resolution of 1° 1°. For this example, we generated synthetic fluxes by only placing nonzero fluxes in a limited number of regions. These fluxes vary spatially but not temporally. We use a zonation model to build the predictor variables – that is, we divide North America into regions where the inner boundaries correspond to a grid of 10° longitude and 7° latitude, as can be observed in Fig. a. Each column of corresponds to an indicator function for each of the subregions on the grid. Specifically, each column of consists of ones and zeros: ones for all model grid boxes that fall within a specific subregion and zero for all other grid boxes. The coefficients determine the weights of each basis vector, and for this example, we use the true values of given in Fig. b; the corresponding mean image is provided in Fig. c. The true emissions in Fig. b were generated as where is a realization of , with representing a Matérn kernel with parameters , , and ; the same covariance model was used in .
Figure 6
Synthetic data used in the atmospheric inverse modeling (AIM) described in Sect. . An illustration of the zonation model is provided in (a): the predictor variables (columns of ) correspond to indicator functions in each of the delimited areas evaluated on the grid. The image of true emissions in (b) is a sum of the true mean image in (c) and the true stochastic component in (d).
[Figure omitted. See PDF]
The forward model represented by is taken from NOAA's CarbonTracker-Lagrange project and is produced through the Weather Research and Forecasting (WRF) Stochastic Time-Inverted Lagrangian Transport (STILT) model system . The observations were simulated based on the spatial and temporal coordinates of OCO-2 observations from July through mid-August 2015 with additive noise following Eq. (), with ppm) corresponding to . Given observations in , forward model , and predictors , the goal of AIM is to estimate
In a standard two-step approach, the first step is to select a set of important predictors (e.g., columns of ). Exhaustive model selection methods are infeasible for this problem because there are more combinations of predictor variables than we can reasonably compute BIC scores for. Thus, we use a forward selection strategy with the BIC to select the predictor variables. For this example, the set of relevant covariates as determined by forward BIC consisted of the following columns: . Then with , we solve the inverse problem using genHyBRmean.
The two-step approach has difficulties in regions with a smaller (but still positive) mean. This approach does not select predictor variables in these regions, though these features are broadly captured in the stochastic component. The reconstructions of the coefficients are provided in Fig. b, where we have augmented with zeroes the coefficients whose indices are not in . Furthermore, the relative reconstruction error norms per iteration of genHyBRmean are provided in Fig. a, and the reconstructions for the two-step approach are provided in the bottom row of Fig. .
Figure 7
(a) Relative reconstruction error norms per iteration of msHyBR and the two-step approach (forward BIC genHyBRmean). (b) Predictor coefficients () for the hypothetical AIM problem.
[Figure omitted. See PDF]
In contrast to the two-step results described above, the msHyBR method selects covariates in all regions that correspond to the true solution. For the msHyBR method, we allow the algorithm to simultaneously estimate the predictor variables for the mean, along with the stochastic component. The reconstructions of the emissions , along with the computed mean and stochastic component using msHyBR, can be found in the top row of Fig. . The corresponding reconstructions for the two-step approach are provided for comparison. Note that the basis vector representation is constructed via thresholding of the reconstructed , which is provided in Fig. b. The msHyBR method inherently performs model selection by incorporating a sparsity-promoting prior on which results in many reconstructed values close to zero. The relative reconstruction error norms per iteration of msHyBR provided in Fig. a show similar (and even slightly better) reconstruction quality compared with the two-step approach. Note that the msHyBR method achieves a similar reconstruction of the edges surrounding the regions with positive mean emissions compared to the two-step approach.
Figure 8
Reconstructions for the hypothetical AIM example corresponding to msHyBR in the top row and a two-step approach in the bottom row. On the left are basis vector representations obtained using selected predictor variables (thresholded for msHyBR). Although the reconstructions of in both cases are similar, the forward BIC approach selects fewer predictor variables, and hence the stochastic component must resolve the difference, whereas the msHyBR approach simultaneously estimates predictor coefficients (many of which are small) and can estimate a smooth stochastic component (similar to the true images in Fig. ).
[Figure omitted. See PDF]
For all of the reconstructions, we used the DP to compute the regularization parameter(s), and we take to represent a Matérn kernel with parameters and . We remark that one of the advantages of msHyBR is that the covariance scaling factor in the model (see Eq. ) can be estimated automatically as part of the reconstruction process. However, for the two-step process, an estimate of is required for both the model selection process and reconstruction. For the two-step process, we used in the first step since the parameter choice has a big impact on the number of selected covariates, and for the second step, we used the DP within genHyBRmean.
4.3 Biospheric CO2 flux exampleThis AIM case study focuses on CO2 fluxes across North America using synthetic observations based on NASA's OCO-2 satellite. In this case study, the true solution is based on CO2 fluxes in space and time from NOAA's CarbonTracker v2022 product (CT2022) . For illustrative purposes, the true fluxes that have been averaged over time are provided in Fig. . This product is estimated using in situ CO2 observations and is commonly used across the CO2 community. Unlike the previous hypothetical case study, the fluxes in this example vary every 3 h (at a 1° 1° spatial resolution), covering a full calendar year (September 2014–August 2015). Thus, the total number of unknowns for this example is . Like the previous example, forward model simulations are from STILT simulations generated as part of NOAA's CarbonTracker-Lagrange project, where the total number of synthetic observations is . We remark that this is a significantly more challenging test case because the problem size is so large that we cannot run comparisons with existing model selection methods (e.g., AIC or BIC). However, we include this example to highlight the applicability of our approach with very large datasets.
Figure 9
True solution representing average CO2 fluxes for the AIM problem based on NASA’s OCO-2 satellite observations.
[Figure omitted. See PDF]
For the predictors of CO2 fluxes, we use a combination of variables, including meteorological variables, in an approach similar to many existing AIM studies of CO2 fluxes
To produce more realistic scenarios and evaluate the performance of the new algorithm under noisy data, we add randomly generated Gaussian noise to the synthetic observations. The setup allows us to evaluate how the performance of the proposed inverse modeling approach changes as the noise or error levels change. Note that for a given realization of the noise , we define the noise level to be and , corresponding to noise levels of 10 % (low noise) and 50 % (high noise), respectively. For the covariance matrix , we use parameters from existing studies that employed this same case study . Specifically, we use a spherical covariance model with a decorrelation length of 586 km and a decorrelation time of 12 d. The diagonal elements of vary by month and have values ranging from (6.6) to (102 )2
Relative reconstruction error norms per iteration are provided in Fig. a. All results correspond to using the DP to select the regularization parameter. We observe that for both noise levels, reconstruction error norms for msHyBR follow the expected behavior (Fig. a), with slightly smaller errors for the smaller noise level.
Figure 10
(a) Relative reconstruction error norm histories for the AIM problem based on NASA’s OCO-2 satellite observations with 10 % and 50 % noise. Reconstructions of the coefficients for 10 % (middle panel) and 50 % (right panel). These results correspond to using the regularization parameters selected by the DP.
[Figure omitted. See PDF]
Figure 11
Reconstructions of the unknown quantities using the new model selection HyBR for the AIM problem based on NASA's OCO-2 satellite observations in the high-noise scenario with 10 % and 50 % noise. Here one can observe the full reconstruction (left), the reconstruction of the mean (middle), and the reconstruction of the stochastic component (right). These results correspond to using the regularization parameters selected by the DP.
[Figure omitted. See PDF]
The estimated values of using msHyBR for the and noise levels are provided in Fig. b and c, respectively. The dotted lines separate the coefficients corresponding to predictor variables of different natures. The first dots (in blue) represent the seasonality of the mean reconstruction. These estimated coefficients () are roughly constant and bounded away from zero. Interestingly, these coefficients do not describe any seasonal variability in the estimated CO2 fluxes, which was one of the original motives for allowing the coefficients to vary by season. Rather, the seasonal variability is entirely captured by the meteorological variables and stochastic component. With that said, these coefficients appear to describe a seasonally averaged mean behavior in the solution.
The second set of coefficients (in red) represents the mixed and potentially collinear set of meteorological variables. We observe that msHyBR can identify which of these coefficients are important and which should be dampened (i.e., corresponding to coefficients close to 0). For different noise levels, we observe that different meteorological variables may be picked up in msHyBR. For the last set of predictor variables that represent spurious and unimportant random vectors, msHyBR successfully damps these coefficients at both the noise levels. This result demonstrates msHyBR's ability to perform reasonable model selection via a sparsity-promoting prior on .
Moreover, the CO2 reconstructions averaged over time can be observed in Fig. using msHyBR for both noise levels. Is it interesting to note that, in the low-level scenario, the stochastic component captures more of the variability and therefore has a strong similarity to the final reconstruction, while in the high-noise-level case, most of the information of the total reconstruction comes from the mean (i.e., from the predictor variables). At higher noise levels, the inverse model makes less detailed adjustments to the CO2 fluxes via the stochastic component. By contrast, at lower noise levels, the inverse model interprets the observations as being more trustworthy or informative, and the inverse model thus estimates a stochastic component with more spatial variability.
5 ConclusionThis paper presents a set of computationally efficient algorithms for model selection for large-scale inverse problems. Specifically, we describe one-step approaches that use a sparsity-promoting prior for covariate selection and hybrid iterative projection methods for large-scale optimization. A main advantage over existing approaches is that both the reconstruction and the predictor coefficients can be estimated simultaneously. The proposed iterative approaches can take advantage of efficient matrix–vector multiplications (with the forward model and the prior covariance matrix) and estimate regularization parameters automatically during the inversion process.
Numerical experiments show that the performance of our methods is competitive with widely used two-step approaches that first identify a small number of important predictor variables and then perform the inversion. The proposed approach is cheaper (since it avoids expensive evaluations of all possible combinations of candidate predictors), which makes it superior for problems with many candidate variables (i.e., a large number of columns of ) and limited observations.
Future work includes subsequent uncertainty quantification (UQ) and analysis for the hierarchical model (Eq. ), which will likely have increased variances due to additional unknown parameters. Efficient UQ approaches will require Markov chain Monte Carlo sampling techniques due to the Laplace assumption; see, e.g., . However, it may be possible to use Gaussian approximations and exploit low-rank structure from the FGGK process, similar to what was done in for Gaussian distributions.
Overall, we anticipate that algorithms like msHyBR will have increasing utility given the plethora of prior information that is often available for inverse modeling and given the computational need for inverse models that can ingest larger and larger satellite datasets.
Appendix A Notation details
To help with readability, especially for readers not so familiar with mathematical terminology, we provide the following list of terms and notation details.
Vector of observations that are to be fitted by the inversion. | |
Number of observations (dimensionality of ). | |
Vector of quantities to be estimated by the inversion. | |
Number of quantities to be estimated (dimensionality of ). | |
Vector of contributions from each of the candidate models. Model selection is achieved by promoting sparsity in , effectively setting contributions from some components to zero. | |
Number of candidate predictors or models (dimensionality of ). | |
Mapping from to , whose columns may contain candidate models. | |
Stochastic component of , assumed to be distributed as zero mean multivariate Gaussian with covariance matrix . | |
Error component of , assumed to be distributed as zero mean multivariate Gaussian with covariance matrix . | |
Forward mapping from to observations . | |
Regularization parameter, estimated as part of the inversion, applied as a scale factor of . | |
Regularization parameter, estimated as part of the inversion, controls the model selection. | |
Transformation to avoid inverting so only needs to be accessed via matrix–vector products. | |
Iteration count, which also corresponds to the dimension of the Krylov subspace at that iteration. | |
Additions to the solution subspace at step . | |
Computed approximations to and at step . | |
Identity matrix with first column |
Code and data availability
The MATLAB codes that were used to generate the results in Sect. 4 are available at 10.5281/zenodo.11164245 . Current and future versions of the codes will also be available at
Author contributions
JC, SMM, and AKS designed the study. JJ ran preliminary studies, and MSL conducted the numerical simulations and comparisons. All the authors contributed to the writing and presentation of results.
Competing interests
The contact author has declared that none of the authors has any competing interests.
Disclaimer
Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Acknowledgements
This work was partially supported by the National Science Foundation ATD program under grants DMS-2208294, DMS-2341843, DMS-2026830, and DMS-2026835. CarbonTracker CT2022 results were provided by NOAA GML, Boulder, Colorado, USA, from the website at
Financial support
This research has been supported by the National Science Foundation (grant nos. DMS-2208294, DMS-2341843, DMS-2026830, and DMS-2026835).
Review statement
This paper was edited by Ludovic Räss and reviewed by Ian Enting and one anonymous referee.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Inverse models arise in various environmental applications, ranging from atmospheric modeling to geosciences. Inverse models can often incorporate predictor variables, similar to regression, to help estimate natural processes or parameters of interest from observed data. Although a large set of possible predictor variables may be included in these inverse or regression models, a core challenge is to identify a small number of predictor variables that are most informative of the model, given limited observations. This problem is typically referred to as model selection. A variety of criterion-based approaches are commonly used for model selection, but most follow a two-step process: first, select predictors using some statistical criteria, and second, solve the inverse or regression problem with these predictor variables. The first step typically requires comparing all possible combinations of candidate predictors, which quickly becomes computationally prohibitive, especially for large-scale problems. In this work, we develop a one-step approach for linear inverse modeling, where model selection and the inverse model are performed in tandem. We reformulate the problem so that the selection of a small number of relevant predictor variables is achieved via a sparsity-promoting prior. Then, we describe hybrid iterative projection methods based on flexible Krylov subspace methods for efficient optimization. These approaches are well-suited for large-scale problems with many candidate predictor variables. We evaluate our results against traditional, criteria-based approaches. We also demonstrate the applicability and potential benefits of our approach using examples from atmospheric inverse modeling based on NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 Department of Mathematics, Emory University, Atlanta, GA, USA
2 School of Mathematics, University of Birmingham, Birmingham, UK
3 Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, MD, USA
4 Department of Mathematics, North Carolina State University, Raleigh, NC, USA