Modeling research in earth system science has blossomed in the past decades as models have become an essential tool to predict future states of the carbon cycle and climate. Hundreds of models have been developed to simulate the land carbon cycle from local to global scale (Manzoni & Porporato, 2009; Sierra, Ceballos-Núñez, et al., 2018). Individual models, such as the Community Land Model (CLM), also became increasingly sophisticated over time as additional processes and internal system feedbacks are incorporated (Lawrence et al., 2019). However, despite growing mechanistic details, model predictive skills have not improved much (Prentice et al., 2015). On the contrary, both model structures and predictions have become less tractable with increasing model complexity. Model intercomparison projects (MIPs) strive to unpack differences among models, the sensitivity of their predictions to drivers, and the processes responsible for such differences in model sensitivity and predictions. Results from nearly all MIPs suggest wide inter-model spread in responses of the terrestrial carbon cycle to climate change (Huntzinger et al., 2017). This large spread has motivated modelers to add more and more process details to address putative mechanistic “gaps” in the models. Paradoxically, the result has been even more divergent model structures, increased complexity, high computational costs, and low tractability (Huntzinger et al., 2017; Lawrence et al., 2018). Overall, modeling the land carbon cycle using current bottom-up modeling architectures faces several roadblocks, including (a) diverse and complex model structures that cannot be easily comprehended nor evaluated; (b) salient model uncertainty with its sources being difficult to identify, which impedes targeted model improvement; and (c) high computational cost that limits the use of rapidly accumulating big data to constrain model prediction. To overcome these roadblocks, it is imperative to develop new approaches to land carbon cycle modeling.
Over the past decades, we have developed a matrix approach in an effort to address the issues identified above via a complementary approach to bottom-up process modeling. The matrix approach helps gain simplicity in structure, clarity in diagnostics, and computational efficiency in spin-up. The latter is a common method to obtain steady states of carbon pools as the estimates of initial carbon pool sizes and is typically the most computationally intensive step in land carbon modeling. Reducing the computational load of spin-up opens the door for comprehensive evaluation and data assimilation to improve the predictive skill of the models. The matrix approach also offers a basis for theoretical analysis of general behavior of land carbon cycle under global change.
Matrix algebra was first used to study steady states and response times of the ocean's geochemical system, including calcium carbonate formation and turnover (Southam & Hay, 1976). Lasaga (1980) used the matrix equations to represent geochemical cycles and explore the stability of carbon and oxygen cycles. Sundquist (1985) performed eigenvalue analysis of matrix equations to examine the interrelationships among carbon cycle processes over a wide range of time scales. We first started organizing a land carbon cycle model in a matrix form to gain analytic spin-up before the model was used to examine the time courses of various processes in response to the elevated CO2 treatment in an ecosystem experiment (Luo et al., 2001). From this exercise, we found that the matrix representation is simple to understand, convenient for mathematical analysis, and computationally efficient for data assimilation (Luo et al., 2003; White et al., 2005; Xu et al., 2006). Other carbon cycle models, such as InTEC, SiB-CASA, and SiB4, were also analytically spun up to steady states using matrix algebra (Chen et al., 2003; Haynes, Baker, Denning, Stockli, et al., 2019; Haynes, Baker, Denning, Wolf, et al., 2019; Schaefer et al., 2008).
More recently, we further developed the matrix representation as a generalizable approach to land carbon cycle modeling. Xia et al. (2012, 2013) converted the Australian Community Atmosphere Biosphere Land Exchange (CABLE) model to one matrix equation, which was used to develop the semi-analytic spin-up (SASU) and traceability analysis. Huang, Lu, et al. (2018) recoded a vertically resolved soil carbon module, which is a part of CLM4.5, in the matrix form for attribution analysis. Huang, Zhu, et al. (2018) converted a version of the Organizing Carbon and Hydrology in Dynamic Ecosystems (ORCHIDEE) model to the matrix form to facilitate a variance-based sensitivity analysis. Lu et al. (2020) fully implemented a matrix approach to represent the interconnected carbon and nitrogen cycle system of CLM5. Meanwhile, the matrix equation has been used for theoretical studies, for example, categorizing decomposition models (Sierra & Müller, 2015), and diagnosing the predictability and dynamic disequilibrium of the land carbon cycle (Luo et al., 2015, 2017).
This paper reviews our efforts in development and applications of the matrix approach. Specifically, we show how the matrix approach unifies land carbon cycle models and improves understanding of the general, transient behavior of the land carbon cycle. We illustrate how the matrix approach is effective in addressing some contemporary issues in land carbon cycle modeling, such as pinning down sources of model uncertainty, accelerating spin-up, and constraining models with big data via data assimilation. While microbial models with nonlinear carbon transfers can also be represented in the matrix form (Sierra & Müller, 2015), this paper is focused on model schemes based on first-order kinetics, which is the most common approach in current land carbon cycle models. There are three reasons for this. First, first-order kinetics of carbon transfer have been supported in term of macroscopic patterns by almost all data sets we have synthesized from litter decomposition and soil incubation experiments (Cai et al., 2018; Luo, 2022; Schädel et al., 2014; Xu et al., 2016; Zhang et al., 2008), notwithstanding strong research community interest in the possibility of nonlinear kinetics emerging as a result of microbial community dynamics. Second, most Earth system models (ESMs) still use the classic first-order kinetics and that modeling community is a target audience for this paper. Third, it has been shown that any nonlinear model for which a solution is known can be expressed with an equivalent linear model that has the exact same solution (Metzler et al., 2018).
Unifying Land Carbon Cycle ModelsHundreds of models have been developed to simulate and predict carbon dynamics (Manzoni & Porporato, 2009). Individual models differ somewhat in their representations of carbon cycle processes, but all share fundamental features (Sierra & Müller, 2015). For example, all models consider carbon input through photosynthesis, partitioning of photosynthetic carbon among plant pools, and carbon transfer from plant to litter and soil carbon pools. Carbon cycling through the network of pools is determined by mass balance and strongly influenced by external forcing variables like temperature, moisture, and soil texture (Luo et al., 2003; Sierra & Müller, 2015). Thus, the land carbon cycle can be mathematically generalized as a nonautonomous, multi-compartmental system (Rasmussen et al., 2016; Sierra, Ceballos-Núñez, et al., 2018; Yang et al., 2011). Mathematically, a nonautonomous system is a dynamic system with parameters and inputs that vary with time. In contrast, an autonomous system is a system of ordinary differential equations with constant parameter values and inputs (i.e., a time-invariant system).
The system of the land carbon cycle can be represented by a set of ordinary differential equations in a matrix form. Equation B1 (Box 1) unifies land carbon cycle models by accommodating any number of pools (i.e., the number of elements in X(t) in Equation B1) and folding the carbon cycle processes that operate on those pools into five components of Equation B1: carbon input μ(t), plant carbon allocation B, carbon transfers A, environmental modifier ξ(t), and turnover rates K (Table 1). Note that vertically resolved soil carbon models have a sixth component, vertical transfer. For example, CLM5 with a vertically resolved soil carbon module includes 18 plant pools for each of the 17 vegetation types and 140 soil pools (i.e., 7 pools per layer over 20 layers) in X(t) (Lawrence et al., 2019). Thus, CLM5 assigns 158 elements (i.e., 18 for plant + 140 for soil) to X(t) for one grid cell if it is occupied by one vegetation type and 194 elements (i.e., 3 × 18 + 140) if the grid is occupied by three vegetation types (Lu et al., 2020). In contrast, the CABLE model has nine pools, therefore X(t) has nine elements (Wang et al., 2010).
Box 1. Matrix Representation of Land Carbon Cycle Models
The land carbon cycle is usually described by a set of differential equations, which can be re-organized in a matrix form (Luo & Weng, 2011; Luo et al., 2003, 2015, 2016) as: [Image Omitted. See PDF]where X(t) is a vector of pool sizes, B is a vector of partitioning coefficients from carbon input to plant pools, μ(t) is carbon input (e.g., Net Primary Production, NPP), A is a matrix with −1 in the diagonal and transfer coefficients in the off-diagonal, K is a diagonal matrix of process rates (mortality for plant pools and decomposition for litter and soil pools), and ξ(t) is a diagonal matrix of environmental modifiers to represent responses of carbon cycle processes to changes in temperature, moisture, or other factors. Parameters B, K, and A are constants in some models but may vary with time. With time-varying μ(t), ξ(t), B, A, and K, Equation B1 represents a multi-compartmental, nonautonomous, dynamical system.
When B, A, K and u(t) are functions of state variable X, Equation B1 can become [Image Omitted. See PDF]Equation B2 represents nonlinear kinetics and usually leads to complex patterns of carbon cycle dynamics (Wang et al., 2014, 2016) that are different from those defined by Equation B1.
It is conceptually possible that B, A, and K are functions of state variable X, but not much empirical evidence has been revealed to support this notion (Cai et al., 2018; K. Zhu et al., 2018; Schädel et al., 2013; Yang et al., 2011; Zhang et al., 2008). On the other hand, carbon input μ(t) is a function of foliage biomass during fast canopy expansion (Cui et al., 2019). This nonlinear kinetics is nullified once the canopy is developed. Nonlinearity in land carbon cycling induced by other mechanisms, such as seasonal fluctuation, acclimatory changes in photosynthetic capacity, and nonlinear responses to temperature and moisture, can be accounted for by Equation B1 via environmental modifiers ξ(t) and parameter changes (Luo & Schuur, 2020). For simplicity, most analysis in this paper is based on Equation B1. Analysis of nonlinear carbon cycle models such as Equation B2 can be found in, for example, Metzler et al. (2018) and Sierra, Ceballos-Núñez, et al. (2018).
Table 1 Examples of Models or Processes With Different Representations of the Five Terms in the Matrix Equation
Each of the five (or six) components of Equation B1 can be described either by fixed values, functions, or nested models. For example, carbon input μ(t) can be simulated by a “bottom-up” approach via scaling up leaf-level biochemical processes in a nested model (Collatz et al., 1992; Farquhar et al., 1980; Rogers et al., 2017). Alternatively, μ(t) can be simulated by a “top-down” approach based on the linear relationship between photosynthesis and canopy absorbed radiation (light-use efficiency; Running et al., 2004), as in the CASA model (Potter et al., 1993). Plant carbon allocation B has been represented with diverse schemes, including fixed coefficients, functional relationships, resource limitation and optimization, for different models (Ceballos-Núñez et al., 2020; De Kauwe et al., 2014; Xia et al., 2017). The carbon transfer elements of matrix A are usually fixed as constants or vary with plant traits or soil texture as in the CENTURY model (Xia et al., 2013). The diagonal matrix K represents rate processes of mortality and litterfall for plant pools, or organic matter decomposition for litter and soil pools; these rate processes are often parameterized as constants (Huang, Lu, et al., 2018; Xia et al., 2012). Environmental modifiers ξ vary greatly among models with different shapes of response functions that characterize the effect of changes in the environment on rate processes (Lei et al., 2018; Sierra et al., 2015).
These different levels of complexity lead to a hierarchy of models under one overarching theory (Sierra, Ceballos-Núñez, et al., 2018) (Figure 1). For example, the elements of the matrix K are usually constant in ESMs with static plant functional types (PFTs; Oleson et al., 2010), but are variable if a model has dynamic vegetation (Moorcroft et al., 2001; Sitch et al., 2003; Smith et al., 2001). To simulate dynamic vegetation, demographic models such as ED2, LPJ-GUESS, or FATES simulate plant physiology, competition, ecosystem assembly, and vegetation distribution (Fisher et al., 2015; Koven et al., 2020). Those processes ultimately influence the carbon cycle by changing values of elements in the matrix K as well as allocation coefficients in vector B, carbon influx μ(t), and, potentially, transfer coefficients A. Recent modeling efforts linking plant traits to tree demography, or microbial traits to carbon processes, make K a function of traits (Thomas et al., 2019). When the decomposition rates of soil organic matter are represented by Michaelis-Menten equations, as formulated in some microbial models, it generates a nonlinearity (Sierra & Müller, 2015; Wieder et al., 2015). Along this hierarchy of model complexity, turnover rates in matrix K change from constants to progressively complex functions of more and more processes (Figure 1).
Figure 1. Hierarchical representation of land carbon models with different levels of complexity. The extremely general representation of the carbon cycle that is physically possible using mass balance constraints (Sierra & Müller, 2015) (Level 1). The inputs to the system are captured by the vector S(X, t) that adds external C to all pools as a function of time and state variable X. The releases are the product of a compartmental matrix C(X, t) that accounts for interactions with state variable similarly as in Equation B2 in Box 1. Any term in the level 1 equation can be expanded to accommodate more details. For example, C(X, t) can be expressed (depicted by blue arrows) as either nonlinear kinetics or linear kinetics as related to process rates (K), donor-controlled transfer rates (A), and environmental modifiers (ξ(t)) (Level 2). These terms, together with carbon allocation (B) and time-dependent photosynthetic rates (μ(t)), form a more specific representation of carbon cycle models as in Equation B1. Each of these terms can be developed further in a hierarchical way. For example, the process rates K can be globally fixed values or vary with plant functional types (Level 3). The latter can be either static or dynamic (Level 4). The dynamic vegetation can be linked to either fixed parameter values for each vegetation type or plant and microbial traits (Level 5). This flexibility in the system representation allows any known land carbon cycle model architecture to be generalized under one theoretical framework without compromising detail in process representation.
Matrix models can be developed by reorganizing carbon balance equations in existing models, through reconstruction from model outputs, or from a flow diagram of element cycling. The CLM5 matrix model was developed by reorganizing hundreds of carbon and nitrogen equations from six modules into four matrix equations representing carbon and nitrogen cycles of vegetation and soil. This matrix version of CLM5 operates as a replica in parallel with the original version online or offline within the Community Earth System Model (CESM; Lu et al., 2020). Matrix models were reconstructed from numerical outputs of carbon stocks and fluxes among pools for LPJ-GUESS, ORCHIDEE, and ELM-ECA (Ahlström et al., 2015; Metzler et al., 2020; Wu et al., 2020). In addition, a new matrix model was developed directly from a phosphorus flow diagram (Hou et al., 2019). The time needed for developing a matrix representation varies greatly, depending on the complexity of the original model. For example, a team of three (E. Hou, X. Lu, and Y. Luo) worked together for less than 1 day to develop a running phosphorus matrix model from a flow diagram. In comparison, the development of the matrix model for CLM5-CN by Lu et al. (2020) took more than 1 year of careful work to guarantee its compatibility with the overall framework of CESM2. To date, dozens of existing models have been converted into the matrix form, including CLM3.5, CLM4, CLM4.5, CLM5, ORCHIDEE, CABLE, LPJ-GUESS, and nonlinear microbial models (Ahlström et al., 2015; Huang, Lu, et al., 2018; Huang, Zhu, et al., 2018; Lu et al., 2020; Sierra & Müller, 2015; Wu et al., 2020; Xia et al., 2012). Four coupled carbon-nitrogen cycle models, CABLE, CLM5, TECO, and GECO, were converted into the matrix form (Lu et al., 2020; Shi et al., 2016; Wang et al., 2022; Xia et al., 2012). A fully coupled carbon-nitrogen-phosphorus model, TECO-CNP, was developed in the matrix form to study effects of nutrient limitations on subtropical forests (Du et al., 2021). The only dynamic vegetation model yet to be converted to the matrix form is LPJ-GUESS, in which vegetation pools and fluxes were lumped together to estimate vegetation turnover (Ahlström et al., 2015). While this was sufficient for the purposes of the target study, it is feasible to represent PFTs and demographic dynamics comprehensively within the structure of the matrix equation. Indeed, demographic models are often expressed in a matrix form (Caswell, 2001; Tuljapurkar & Caswell, 1997).
Once developed, matrix models allow for new processes to be incorporated, as any process that influences carbon cycle must be related to one or more of the five (or six) components of the matrix equation (Table 1). For example, vegetation dynamics, which usually result from competition among plants with different strategies, influence carbon input rate μ(t), plant allocation B, mortality and litter decomposition rates K, and carbon transfer A due to litter quality changes for different vegetation types (Fisher et al., 2015; Koven et al., 2020; Moorcroft et al., 2001; Smith et al., 2001). Nitrogen and phosphorus cycle processes primarily modify the rates of photosynthesis μ(t), allocation B, transfer A, and mortality and decomposition rates K (Q. Zhu et al., 2019; Wang et al., 2010), but do not directly affect the environmental modifier ξ(t). Trait-based modeling has the potential to link plant and microbial traits to photosynthesis, allocation, mortality, decomposition, transfer, rooting depth, and plant and microbial responses to environmental variables (Fry et al., 2019; Thomas et al., 2019). Thus, it may influence all five components (six if vertical mixing is resolved) of the matrix equation. The newly incorporated processes can be evaluated for their impacts on modeled carbon cycle dynamics using a traceability analysis as discussed in Section 4.1 (e.g., Liao et al., in prep.).
Understanding General Behavior of Land Carbon Cycle Under Global ChangeDespite extensive research using modeling, experimental, and observational approaches, key gaps in our theoretical understanding of the land carbon cycle have remained for the past six decades since Olson (1963) characterized steady-state dynamics at which the carbon storage capacity equals the product of carbon input and residence time. However, it is unlikely that the global carbon cycle has ever been at steady state. In contrast, a constantly changing environment keeps the carbon cycle in a dynamic disequilibrium. Since pre-industrial times, the most important factor behind this disequilibrium is that human activities result in substantial release of CO2 into the atmosphere from land cultivation and fossil fuel burning (Luo & Weng, 2011). As a nonautonomous, multi-compartmental system (Rasmussen et al., 2016; Sierra, Ceballos-Núñez, et al., 2018), how land ecosystem carbon balance varies under dynamic disequilibrium has not been theoretically well explored.
The matrix equation as a generic expression of carbon cycle model structure makes it possible to study the general behavior of the land carbon cycle. The matrix equation B1 in Box 1 can be rearranged to describe carbon storage capacity and carbon storage potential (Luo et al., 2017) as: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]where X′(t) equals dX(t)/dt in Equation B1 and represents net pool change, τch is chasing time, and τE(t) is ecosystem residence time. Chasing time τch measures how fast the net C pool change, X′(t), is redistributed through the network of multiple carbon pools (Luo et al., 2017). Ecosystem residence time τE(t) equals: [Image Omitted. See PDF]
Xc(t) is the land carbon storage capacity. It can be expressed as: [Image Omitted. See PDF]
Xp(t) is the land carbon storage potential. It can be described as: [Image Omitted. See PDF]
The carbon storage capacity is the maximal amount of carbon that a land ecosystem can store. Traditionally, the storage capacity has been considered as a static concept that results from the balance between carbon input and decomposition when an ecosystem is approximately at steady state (Olson, 1963) (Figure 2a). For example, long-term averaged carbon stock in the soil is lower in tropical than boreal forests, due to faster decomposition rates (i.e., shorter residence time) in tropical soils, notwithstanding higher carbon input from litterfall and plant mortality (Crowther et al., 2019). When this definition is applied to an ecosystem with multiple carbon compartments under non-steady state (i.e., a time-dependent, nonautonomous system), the carbon storage capacity is no longer a static constant; instead, it becomes time-dependent, as both carbon input and residence time vary with time. Regardless, it still sets the maximal amount of carbon that an ecosystem can store at time t. Thus, carbon storage capacity represents an instantaneously driving state that carbon storage X(t) chases, varying over seasons of a year (Figure 2b) and with climate change over decades (Figure 2c). When the capacity Xc(t) is larger than the carbon storage X(t) itself, carbon sequestration occurs and carbon stock increases; otherwise, the ecosystem loses carbon and carbon stock decreases.
Figure 2. Transient carbon cycle dynamics as determined by carbon storage capacity and potential on different time scales. Panel (a) is for an autonomous system in which carbon storage capacity (Xc) is a constant while carbon storage potential (Xp(t)) and carbon storage itself (X(t)) vary with time. In the illustrated case, the capacity is assumed to abruptly increase by 40%, mainly due to instantaneous increase in carbon input as in a CO2 experiment (Luo & Reynolds, 1999). Consequently, the potential immediately increases and then gradually declines as X(t) increases toward equilibrium. Panel (b) illustrates time-dependent X(t), its capacity Xc(t) and potential Xp(t) in a nonautonomous system over days of year (DOY; Rasmussen et al., 2016). Seasonal change in the capacity is due to change in carbon input, which is low in winter and high in summer. Panel (c) illustrates time-dependent X(t), its capacity and potential in a nonautonomous system over years in response to climate change (Luo et al., 2017). Carbon input and residence time, which jointly determine the capacity, fluctuate over the year and change with climate change, resulting in strong fluctuation in the storage capacity while X(t) of multiple pools smoothly changes over time (Rasmussen et al., 2016). In panels (b and c), the capacity is a driving state that X(t) chases. The rate of chasing is proportional to the storage potential.
The land carbon storage potential or its variants (e.g., carbon sink or sequestration potential) is a term that has been used in various contexts (Hudiburg et al., 2009; Zhou et al., 2015), but it has not been rigorously defined before. The matrix equation yields a precise definition for and offers a biogeochemical interpretation of the carbon storage potential (Equation 4). It is the amount of carbon that an ecosystem can take up to reach the carbon storage capacity and is a measure of the magnitude of disequilibrium (Luo & Weng, 2011). Thus, carbon storage potential represents the internal capability of an ecosystem to equalize its current carbon storage with the carbon storage capacity. Biogeochemically, it describes the redistribution of net carbon pool change, X′(t), of individual pools at each time step through networked pools connected by carbon transfers from one pool to the others through all the pathways, each pool having its distinct turnover time (Luo et al., 2017). The time to fully equalize the current carbon storage with the capacity through the matrix of pools is measured by chasing time, which can be statistically approximated by the slope of regression between the carbon storage potential and net ecosystem production (NEP; Jiang et al., 2017; Luo et al., 2017).
Both the carbon storage capacity and potential are regulated by ecosystem-internal processes that govern carbon influx and residence time. Carbon influx represents the combined effect of all the processes at biochemical, leaf, and canopy scales that influence plant photosynthesis. Many other processes influence residence time. For example, an increase in plant mortality results in a decrease in residence time and consequently carbon storage. An increase in microbial carbon use efficiency, has the contrary effect of prolonging residence time and carbon sequestration. External forcing variables, such as cyclic changes in light and temperature, long-term directional changes in global climate, disturbance events, disturbance regime shifts, and state changes, influence the carbon storage capacity and potential through changes in carbon input, plant carbon allocation, rate processes and transfers among plant and soil pools (Luo et al., 2015; Luo & Weng, 2011).
The carbon storage potential mathematically integrates fluxes X′(t) with time τch(t) (see Equation 4). It quantifies how net ecosystem exchange (NEE; X′(t)), an instantaneous measure of carbon fluxes at a given time, is transformed through a network of multiple pools. The time scale of NEE transformation within the multi-pool network is measured by chasing time τch(t) (Luo et al., 2017). Carbon chasing time in the multi-pool network cannot be easily measured in the field, as the data required to estimate turnover time (equaling to an inverse of elements in matrix K) of individual carbon pools and transfer coefficients among them (element in matrix A) are usually not available. Nonetheless, the carbon storage potential can be estimated from multiple datasets. For instance, radiocarbon 14C has been used to quantify mean carbon ages of various plant, litter, and soil pools (Gaudinski et al., 2000; Shi et al., 2020). Soil respiration and its isotopic signature can be used to estimate the age of the release flux out of an ecosystem, which is a measure of turnover time (K−1) (Sierra, Hoyt, et al., 2018; Sierra et al., 2012; Trumbore et al., 1996). Carbon contents in various pools—leaf, wood, root, litter, and soil—contain information on timescales of carbon cycling. Together, those datasets have been used to constrain multi-pool soil carbon models through data assimilation (Liang et al., 2018; Xu et al., 2006). Therefore, they could be used to estimate the carbon storage potential. Note that carbon sequestration between two time points is a result of changes in both the carbon storage capacity and potential (Jiang et al., 2021).
Overall, the matrix equation offers a new foundation for understanding the general behavior of the land carbon cycle and quantifying carbon uptake or loss by ecosystems at different timescales.
Addressing Contemporary IssuesOnce models are expressed in the matrix form, we can address a range of perennial issues in land carbon cycle modeling, such as sources of model uncertainty, computational efficiency of spin-up for complex models, and uses of big data to constrain model predictions.
Pinning Down Sources of Model Uncertainty With Traceability AnalysisDespite extensive research and model development over the past 20 years, the model uncertainty (i.e., across-model spread) in ESM projections of the global land C sink has remained large from the Third Assessment Report of the Intergovernmental Panel on Climate Change published in 2001 to the sixth report published in 2021 (Canadell et al., 2021; Ciais et al., 2014; Denman et al., 2007; Prentice et al., 2001). To address the model uncertainty issue effectively we must understand what causes it. The matrix approach offers a unique method to precisely pin down sources of model uncertainty.
Once models are expressed in the unified matrix form, the model-to-model variations can be easily traced to differences between models in the five (or six) terms in Equation B1. For example, the model-model differences can be shrunk to zero (i.e., identical predictions) among eight models if the terms in Equation B1 are standardized by adjusting response functions ξ(t) and changing values of parameters B, A, and K to yield the same outputs for each of the terms (Hou et al., submitted) (Figure 3). This manipulation demonstrates that the matrix approach makes model uncertainty a tractable issue: through simple mathematical adjustments in the matrix equation terms, we can analytically and precisely track the model uncertainty down to its sources. To do so, we developed a traceability analysis based on the mathematical properties of the matrix equation. The traceability analysis first decomposes simulated carbon pool sizes at a given time t0 X(t = t0) into carbon storage capacity Xc(t = t0) and carbon storage potential Xp(t = t0) according to Equation 1c. Xc can be further decomposed to carbon residence time τE(t = t0) and net primary production μ(t = t0) according to Equation 3. Meanwhile, Xp(t = t0) can be decomposed to chasing time τch(t = t0) and NEP X′(t = t0) according to Equation 1b. The components (Aξ(t = t0)K)−1 in τE(t = t0) or τch(t = t0) can be further decomposed to A−1, ξ(t = t0)−1, and K−1, which are related to parameters or environmental variables.
Figure 3. Spreading and shrinking model uncertainty. We developed a matrix-based model intercomparison project by expressing eight land carbon cycle models (i.e., TEM, CENTURY, DALEC2, TECO, FDBC, CASA, CLM4.5, and Organizing Carbon and Hydrology in Dynamic Ecosystems [ORCHIDEE]) in a unified matrix form. The eight models differ greatly in complexity, with the number of carbon pools ranging from 2 in TEM to 101 in ORCHIDEE. All eight models were driven by the same gross primary production (GPP) and environmental variables (temperature, T and precipitation, P) from a peatland in northern Minnesota, USA. Despite the same forcing data, the variations among the models are large for cumulative net ecosystem production (NEP). The model uncertainty shrinks to almost zero with nearly identical predictions by the eight models when we standardize these parameters in the matrix equations related to plant carbon partitioning coefficients B, transfer coefficients A, environmental scalar ξ(t), vertical mixing coefficients, and decomposition coefficients K (Hou et al., submitted).
The traceability analysis can place any modeling outputs in a 3-dimension space ordinated by carbon input, residence time, and carbon storage potential to understand how modeled land carbon storage differs across an ensemble of models, regardless of differences in model structure and external forcing variables (Jiang et al., 2017; Luo et al., 2017; S. Zhou et al., 2018; Wu et al., 2020). Once model-model differences are identified, sources of uncertainty in model predictions can be tracked down hierarchically to model structures, parameters, forcing variables as well as influences of different initial values in simulations after spin-up (Jiang et al., 2017; Luo et al., 2017; Xia et al., 2013). For example, driven with similar forcing data, CLM3.5 predicted ∼31% larger carbon storage capacity than CABLE (Figure 4). Rafique et al. (2016) used the matrix-based traceability analysis to show that this difference is due to 37% higher NPP and 11 years shorter ecosystem residence time in CLM3.5 than these in CABLE (Figure 4). The latter results from shorter baseline carbon residence time for the woody biomass pool in CLM3.5 than CABLE (14 years in CLM3.5 vs. 23 years in CABLE), and a lower proportion of NPP allocated to woody biomass in CABLE (16%) than CLM3.5 (23%). The traceability analysis has also been successfully applied to understand the spread in carbon storage capacity among different biomes in CABLE (Xia et al., 2013) and the different responses of Harvard and Duke Forests to climate change (Jiang et al., 2017).
Figure 4. Clarity in diagnostics enabled by the matrix equation. A traceability analysis shows CLM3.5 has a higher carbon storage capacity at steady state (Xss), due to higher NPP than CABLE. However, carbon residence time (τE) is higher in Community Atmosphere Biosphere Land Exchange (CABLE) than Community Land Model 3.5 (CLM 3.5) mainly due to difference in the baseline carbon residence time (τEʹ ${\tau }_{E}^{\prime}$). The environmental scalar ξ, including its temperature (ξT) and water components (ξw), is similar between the two models. CABLE has higher τE than CLM 3.5 mainly because CABLE allocates more carbon to wood growth than CLM 3.5 (Rafique et al., 2016). Model differences in NPP can also be traced to processes as shown in other studies (Cui et al., 2019; Xia et al., 2015). Overall, the matrix approach offers clarity in diagnostics to track model differences to model structure and/or parameter values. Note that symbols in inserted figures represent different vegetation types.
Once the main sources of model uncertainty are identified, we can target model improvement efforts toward the parameters, processes or feedbacks responsible for that uncertainty, using empirical data to inform more realistic model structure (Luo et al., 2012) and parameterization (Luo & Schuur, 2020). More accurate model parameterization can also be achieved by data assimilation, which is described in Section 4.3.
Accelerating Spin-Up With Semi-Analytic Spin-UpSpin-up in biogeochemical models is a common procedure to estimate initial values of carbon pool sizes. Spinning up a complex biogeochemical model is computationally very expensive. For example, it takes more than 35 days on Supercomputer Cheyenne to run CLM5 to steady state using the native dynamic method, or about 1 week using the accelerated decomposition (AD) method (Lawrence et al., 2019). The computationally expensive spin-up procedure takes resources away from some critical analyses, such as parameter sensitivity analysis, that could inform improvements in CLM5 or similar models. An efficient spin-up method can greatly liberate models from this computational burden.
Once a model is expressed in a matrix form, the steady-state pool sizes at a given time t = t0 can be analytically obtained by letting Equation B1 equal to zero: [Image Omitted. See PDF]
Equation 5 can then be solved to obtain steady-state pool sizes as: [Image Omitted. See PDF]
In practice, we usually use multi-year forcing data to run a model repeatedly until slow pools reach quasi-equilibrium. When values of forcing variables change, μ(t) and ξ(t) in Equation 6 take the new values, leading to different Xss. Thus, we need to solve Equation 5 many times to obtain Xss at quasi-equilibrium of slow pools when multi-year forcing data are used to run a model repeatedly (Xia et al., 2012). This is the general procedure of SASU.
SASU has been applied to the carbon-only and coupled carbon-nitrogen versions of CABLE and saved about 92.4% (i.e., 12.2 times faster) and 86.6% (i.e., 7.5 times faster) of the computational time, respectively, in comparison with the default spin-up method (Xia et al., 2012). When the SASU method is implemented into CLM5, it is 49.6 times faster than the native dynamic method and 8.0 times faster than the AD method (Liao et al., in prep.). The AD method is the default spin-up method in CLM5 (Lawrence et al., 2019). It uses a set of acceleration factors to reach steady-state pool sizes (Thornton & Rosenbloom, 2005). Note that forward modeling with matrix equations can be computationally very slow. However, this issue can be overcome by using a sparse matrix algorithm (Lu et al., 2020). Nevertheless, the greatly enhanced computational efficiency of SASU compared to traditional spin-up enables computational resources to be redeployed toward parameter sensitivity analyses (Huang, Zhu, et al., 2018) and data assimilation with complex carbon cycle models (Hararuk et al., 2014, 2015; Shi et al., 2018).
Constraining Complex Models With Big Data by Matrix-Enabled Data AssimilationThe matrix approach makes it computationally feasible to assimilate multiple data sets to constrain the predictions of land carbon sequestration. Presently, land carbon dynamics simulated by ESMs deviate substantially from observation-based estimates. For example, global soil carbon storage simulated by 11 models participating in the Coupled Model Intercomparison Project phase 5 (CMIP5) fit poorly with the soil organic carbon stocks reported in the Harmonized World Soil Database (HWSD; Luo et al., 2015). Observational and experimental research produces ever-larger quantities of high-resolution, multi-dimensional data, offering great resources to constrain model prediction via data assimilation (Luo et al., 2011; Reichstein et al., 2019; Xia et al., 2020). In the past, flux-based data have been used to improve model predictive skills (Fox et al., 2018; MacBean et al., 2018; Rayner et al., 2005). While pool-based data are essential to constrain rate processes (Xu et al., 2006), their assimilation into ESMs is inhibited by unattainable computational expense.
The SASU method derived from the matrix approach makes it computationally feasible to assimilate both flux- and pool-based data to constrain predictions of land carbon sequestration from full dynamic models such as ESMs (Hararuk et al., 2014, 2015; Liang et al., 2018; Shi et al., 2018; Tao et al., 2020). When the matrix version of CLM3.5 was used in data assimilation to constrain its parameters, the model-data fit was improved to the extent that modeled soil carbon density accounted for 41% of variation in the HWSD database, in comparison to 27% without data assimilation (Hararuk et al., 2014). We have recently used the matrix version of the CLM5 soil carbon module in PROcess-guided deep learning and DAta-driven modeling (PRODA) to further improve predictions of soil carbon variability at regional and global scales. With PRODA, CLM5 can explain 62% of variation in soil organic carbon content in the conterminous USA (Tao & Luo, 2022; Tao et al., 2020) (Figure 5). Data assimilation can help not only improve model performance but also evaluate the relative contributions of model structure and parameterization, versus data, to model uncertainty (Shi et al., 2018). Such data assimilation work at regional or global scales has become possible thanks to the computational savings enabled by the accelerated spin-up facilitated by the matrix approach.
Figure 5. The agreement between observed and modeled SOC content for the conterminous USA with different approaches. CLM5 simulations of SOC with default parameters explain 32% of variation in 25,444 observed vertical profiles of SOC content at different depths over the conterminous USA (a). When CLM5 matrix model was used in data assimilation to constrain model parameters, 43% of variation in observations was explained (b). When CLM5 matrix model was trained by the PROcess-guided deep learning and DAta-driven modeling (PRODA) approach to constrain and predict spatially varying model parameters, explained variation rose to 62% (c) (Tao & Luo, 2022).
We have shown that the matrix approach can unify land carbon cycle models. The unified matrix equation represents the dynamic disequilibrium of the land carbon cycle in terms of three overarching variables: carbon input, residence time, and carbon storage potential. Carbon input and residence time together quantify the carbon storage capacity at equilibrium, whereas the storage potential measures the magnitude of disequilibrium. The matrix equation folds internal processes and external variables into one framework to yield a system-level representation that facilitates understanding and analysis of the carbon cycle in all its complexity.
We have demonstrated the feasibility of converting different kinds of land carbon, nitrogen, and phosphorus cycle models to matrix equations. The result is simplicity in structure, high modularity in code, clarity in diagnostics, and computational efficiency in spin-up. Once expressed in the matrix form, all models regardless of complexity are organized in a standardized yet concise form that allows us to understand differences in structures and parameters among models, and to trace how these affect their outputs.
The matrix approach makes it possible to pinpoint sources of uncertainty in model predictions via traceability analysis. Pinpointing sources of model uncertainty across ensemble members of MIPs helps direct targeted efforts on model improvement. The fast spin-up enabled by the matrix approach makes it possible to capitalize on ever-increasing availability of empirical data by assimilating multiple datasets into complex models to improve prediction. The integration of big data with multiple models likely presents the best hope to accurately characterize the current state of the land carbon cycle and predict the future trajectory of terrestrial carbon sequestration.
Benefits of converting a traditional carbon cycle model to the matrix form might be limited if the original model is relatively simple and easily understood, spin-up is fast enough, and/or there is no need to calculate the new analytics offered by the matrix approach for understanding simulation outputs. Converting existing models, especially complex schemes used in current generation ESMs, to matrix models could be time-consuming and, thus, requires a considerable initial investment before benefits are gained. Although nonlinear microbial models can be expressed in the matrix form, it is not clear which advantages of the matrix approach would be gained. It is yet to be fully explored how best the matrix approach is applied to dynamic vegetation models. While the matrix approach empowers researchers to study the land carbon cycle, there are still many processes that are insufficiently understood or quantified, theoretically and empirically, and to be accurately represented in models. Ultimately, concerted collaboration among experimental, observational, and modeling research communities is a precondition for substantive progress. The matrix approach offers a toolbox of novel analytical and technical solutions that can accelerate this collaborative work towards an improved understanding of the land carbon cycle, its drivers, and its role within climate dynamics.
AcknowledgmentsThe authors appreciate constructive comments and suggestions offered by Drs. Dafeng Hui, Bruce Hungate, Nate McDowell, Andrew Richardson, Edward Schuur, and Kees Jan van Groenigen on various versions of this paper. Research leading to this publication has been financially supported partially by US National Science Foundation (DEB 1655499, DEB 2017884), US Department of Energy (DE-SC0020227), and the subcontracts 4000158404 and 4000161830 from Oak Ridge National Laboratory (ORNL) to Y. Luo, and the German Research Foundation (Emmy-Noether Programme, SI 1953/2-1) to C. A. Sierra. The study is a contribution to the research environment Biodiversity and Ecosystem Services in a Changing Climate (BECC) at the Lund University.
Data Availability StatementNo original data or code was used in this manuscript.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Land ecosystems contribute to climate change mitigation by taking up approximately 30% of anthropogenically emitted carbon. However, estimates of the amount and distribution of carbon uptake across the world's ecosystems or biomes display great uncertainty. The latter hinders a full understanding of the mechanisms and drivers of land carbon uptake, and predictions of the future fate of the land carbon sink. The latter is needed as evidence to inform climate mitigation strategies such as afforestation schemes. To advance land carbon cycle modeling, we have developed a matrix approach. Land carbon cycle models use carbon balance equations to represent carbon exchanges among pools. Our approach organizes this set of equations into a single matrix equation without altering any processes of the original model. The matrix equation enables the development of a theoretical framework for understanding the general, transient behavior of the land carbon cycle. While carbon input and residence time are used to quantify carbon storage capacity at steady state, a third quantity, carbon storage potential, integrates fluxes with time to define dynamic disequilibrium of the carbon cycle under global change. The matrix approach can help address critical contemporary issues in modeling, including pinpointing sources of model uncertainty and accelerating spin‐up of land carbon cycle models by tens of times. The accelerated spin‐up liberates models from the computational burden that hinders comprehensive parameter sensitivity analysis and assimilation of observational data to improve model accuracy. Such computational efficiency offered by the matrix approach enables substantial improvement of model predictions using ever‐increasing data availability. Overall, the matrix approach offers a step change forward for understanding and modeling the land carbon cycle.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details











1 Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ, USA; Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA; Now at School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
2 CSIRO Oceans and Atmosphere, Aspendale, VIC, Australia
3 Max Plank Institute of Biogeochemistry, Jena, Germany; Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
4 Center for Global Change and Ecological Forecasting, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, China
5 Department of Physical Geography and Ecosystem Science, Lund University, Lund, Sweden
6 Joint Innovation Center for Modern Forestry Studies, College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
7 Northern Forestry Centre, Canadian Forest Service, Natural Resources Canada, Edmonton, AB, Canada
8 Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
9 Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ, USA
10 Department of Earth System Science, Ministry of Education Key Laboratory for Earth System Modeling, Institute for Global Change Studies, Tsinghua University, Beijing, China
11 School of Atmospheric Sciences, Sun Yat‐sen University, Guangzhou, China
12 The Institute for Environmental Genomics, University of Oklahoma, Norman, OK, USA
13 Department of Physical Geography and Ecosystem Science, Lund University, Lund, Sweden; Hawkesbury Institute for the Environment, Western Sydney University, Richmond, VIC, Australia