Full text

Turn on search term navigation

Introduction

Land-atmosphere CO₂ exchange is a pathway through which about 30% of anthropogenic CO₂ emission is assimilated to terrestrial ecosystems (Friedlingstein et al., 2023). It is hence, a crucial flux to understand the future evolution of atmospheric CO₂ concentration under global warming. Despite the importance, the current model projection of land-atmosphere CO₂ exchange is largely uncertain. Climate models with carbon cycle coupled or Earth system models show diverging simulations of the evolution of land-atmosphere CO₂ exchanges with different climate change scenarios in terms of magnitude and even the sign (Arora et al., 2020; Friedlingstein et al., 2014; Heimann & Reichstein, 2008), which in turn makes the air temperature projection more uncertain (Bodman et al., 2013; Booth et al., 2012).

Reducing the uncertainty of the projection of land-atmosphere CO₂ exchange necessitates better knowledge of the regions driving the CO₂ exchange and its variations (Ahlström et al., 2015). In this context, the regional contributions have received growing research interest. Nevertheless, the regional contribution to the seasonality in the global land-atmosphere CO₂ exchange has been studied less. It has been found that the seasonal variability in the global gross CO₂ uptake by terrestrial vegetation (i.e., gross primary productivity, GPP) is driven by the Northern mid-to-high latitudes (e.g., Chen et al., 2017), and the same conclusion is expected for the terrestrial carbon release (i.e., ecosystem respiration, RECO) due to the strong coupling between carbon uptake and release across biomes (e.g., Migliavacca et al., 2015). For the global net CO₂ flux between land and atmosphere (i.e., net ecosystem exchange, NEE), three inversion models and dynamic global vegetation models (DGVMs) agreed that terrestrial ecosystems in Northern Hemisphere drive its seasonal variability (e.g., Piao et al., 2020; Seiler et al., 2022). However, the governing processes behind it remain uncertain due to complex interactions among carbon cycle, hydrology, and climate (e.g., Liu et al., 2020).

Moreover, regarding the interannual variability (IAV) in NEE, studies diverge in their conclusions on the regional contribution analysis (Piao et al., 2020). One atmospheric inversion product identified humid tropics as the dominant region, whereas an ensemble of data-driven models contradicted this, showing stronger IAV from arid regions (Marcolla et al., 2017). Poulter et al. (2014) examined CO₂ fluxes estimated using an atmospheric inversion, a dynamic global vegetation model, and a global carbon budget accounting method to show that semi-arid regions in the Southern Hemisphere dominated the IAV in the global net land-atmosphere CO₂ exchange. This conclusion was corroborated by other studies (e.g., Ahlström et al., 2015; Zhang et al., 2018). However, some studies emphasize tropical humid regions as significant contributors to the IAV. For instance, Piao et al. (2020) conducted spatial attribution analysis using NEE estimates from various approaches, including atmospheric inversion models, data-driven upscaled estimates, and DGVMs. All the examined methodologies in Piao et al. (2020) agreed that contributions by semi-arid tropics versus humid tropics are not qualitatively different. M. Wang et al. (2022) used a satellite-driven process-based ecosystem model and found the dominance of evergreen broadleaf forests, which prevailed over humid tropics, to the global NEE IAV. Furthermore, indirect evidence underscores the importance of humid tropical regions. Humphrey et al. (2021) analyzed Earth system model simulations and found that the global NEE IAV is primarily driven by land-atmosphere interactions, with their hotspots located in semi-arid and tropical regions, indicating that these regions are also hotspots for the global NEE IAV.

Such spread of conclusions, particularly from studies using DGVMs, stems in part from the complex underlying biogeochemical mechanisms and associated parameter uncertainties (Luo et al., 2015). As model development progresses, the structure becomes more complex with a greater number of parameters (J. B. Fisher et al., 2014; R. A. Fisher & Koven, 2020). However, many of these parameters are determined empirically without a sound theoretical basis or sufficient observational support within the simulation domain (Famiglietti et al., 2020; Feng, 2020; Prentice et al., 2015), leaving parameters and thus the simulation more uncertain. A promising alternative is to leverage observation-based products to inform the model for parameter calibration (Keenan et al., 2012; Niu et al., 2017; Williams et al., 2009). Previous studies demonstrated the efficacy of the model-data integration in simulating water and carbon cycles (e.g., Bloom et al., 2016; Lee et al., 2023; Trautmann et al., 2018, 2022). Levine et al. (2023) applied a model-data integration framework to show that the tropical humid region dominates the IAV of pan-tropical land carbon uptake, showing the potential applicability of model-data integration methods in understanding the global carbon cycle variability.

Here, we diagnose gridwise and regional contributions to seasonal and interannual variability of the global land-atmosphere CO₂ exchange. To this end, we develop a process-based diagnostic biogeochemical model of carbon and water cycles, namely the Strategies to INtegrate Data and BiogeochemicAl moDels (SINDBAD) framework. SINDBAD aims at reproducing patterns in various observational data streams of the carbon and water cycles through extensive calibration of a parsimonious model structure. After calibration, we evaluate the performance of the model against observation-based products used for the calibration. Finally, we apply the model estimates of CO₂ fluxes for the spatial attribution analysis. We expect that our data-driven process-based model will provide an additional perspective on the spatial contributions to global CO₂ variability and improve its understanding. We specifically address the following questions:

How well can a parsimonious process-based diagnostic model simulate observed seasonal and interannual variability in gross and net CO₂ fluxes compared to state-of-the-art prognostic models?
Which regions drive seasonal and interannual variability of the global CO₂ flux?

Methods and Data

Figure 1 provides a visual overview of the workflow of this study. We build a simple process-based model structure to simulate CO₂ exchanges between land and atmosphere (Section 2.1). The model structure is built on top of a hydrological model by Trautmann et al. (2022), wherein a new set of carbon cycle process formulations is incorporated into the current study. The newly developed model is driven by a set of meteorological variables and is constrained by available observation-based data products (Section 2.2) see Appendix A in Lee et al. (2023). During model calibration, parameters are optimized at the regional scale to find the set that produces the best fit between the simulated and observed carbon and water pools and fluxes (Section 2.3). A spatially and temporally constant value is determined for each parameter, and thus these parameters are global constants. Using the optimized parameter set, a forward run of the model produces a daily time series of pools and fluxes for each 1-degree grid. We evaluate the model performance of carbon fluxes across spatial (i.e., global and regional) and temporal (i.e., seasonal and interannual) scales (Section 2.4). Then, we attribute the variance in the global fluxes to grids and regions using a covariance matrix analysis to understand the spatial pattern of the global CO₂ fluxes (Section 2.5).

[IMAGE OMITTED. SEE PDF]

Model Description

SINDBAD is a model-data integration framework (MDIF) that integrates processes of land biogeochemistry with an array of different observational data sets to estimate parameters. Note that SINDBAD is not a specific model but an MDIF which allows for multiple configurations of model structure, observational constraints, and other settings (e.g., Trautmann et al., 2018, 2022). For simplicity, we refer to the specific configuration of SINDBAD used in this study as SINDBAD. SINDBAD is designed for employing parsimonious model structures, where the main processes are represented by relatively simple equations with comparatively few effective parameters. This parsimony improves parameter identifiability, interpretability of simulation results, and computational speed. The CARDAMOM approach (Bloom et al., 2016) follows a similar philosophy but differs from SINDBAD in many ways, including the model structure, the observational constraints, and the model data integration strategy. For example, CARDAMOM performs Bayesian optimization of parameters for each grid cell, while SINDBAD calibrates a set of globally constant parameters. When comparing such MDIFs with the traditional DGVMs (i.e., process-based models without parameter calibration), MDIFs facilitate model calibration as well as a formal scrutinization of the model structure in capturing observed patterns. Hence, their strength is to provide diagnostic assessments of the past and for the modeling domain covered by observational data streams. Effectively, MDIFs facilitate a joint interpretation and explanation for patterns in diverse and heterogeneous data streams. On the other hand, MDIFs risk overfitting to a specific spatiotemporal domain such that extrapolation to for example, future climate conditions are conceptually questionable. This requires either a consideration of the dynamics of parameters in space and time or a modeling strategy rooted more strongly in the philosophy of representing detailed and complete process understanding, or both. The latter strategy is followed by traditional process-based land surface and vegetation models included in TRENDY, which involve complex structures, inherent uncertainties in model design, and a lack of calibration.

The model structure of SINDBAD framework in this study extends the hydrological model by Trautmann et al. (2022) with carbon cycle components, including gross primary productivity (GPP), ecosystem respiration (RECO), and net ecosystem exchange (NEE). Here, we describe the general flow of water (Section 2.1.1) and carbon (Section 2.1.2) processes in the model and associated formulations.

Water Cycle

Figure 2 shows the structure of the model. The general flow of water cycle components is as described by Trautmann et al. (2022). Precipitation is partitioned into solid snowfall and liquid rainfall based on the air temperature threshold. The snowfall can sublimate (E_sub) or melt to replenish snow water storage (wSnow). The rainfall and snowmelt flow to the top of the soil, where vegetation controls the partitioning into the surface runoff, interception loss, and percolation, respectively. The incoming percolation (I_in) infiltrates to the upper soil layer (wSoil1). In exceeding the capacity of wSoil1, infiltration excess (I_exc) is generated, which either refills slowly varying water storage (wSlow) or goes out of the system as fast runoff (Q_fast). Together with Qfast, slow runoff from wSlow (Q_slow) constitutes the total runoff (Q) from the system. Deeper soil layer (wSoil2) exchanges moisture with deep water storage (wDeep) based on the gradient of potential, creating capillary rise (from wDeep to wSoil2) and drainage (from wSoil2 to wDeep). A fraction of wDeep moves to wSlow. Evaporation (E) consists of E_int, E_sub, and soil evaporation (E_soil) from wSoil1; evapotranspiration (ET) is the sum of transpiration (T) from both soil layers and E. Terrestrial water storage (TWS) consists of wSnow, wSoil1, wSoil2, wDeep, and wSlow. Note that the formulations for the water cycle components are presented in detail in Trautmann et al. (2022); minor updates in the formulations are described below.

[IMAGE OMITTED. SEE PDF]

The maximum water holding capacity of the first soil layer is estimated via a calibrated parameter, ${s}_{\mathit{max1}}$ . The second soil layer, $wSoil{2}_{\mathit{max}}$ is calculated as, 1 $wSoil{2}_{\mathit{max}}={s}_{\mathit{max2}}\cdot \sum\limits _{d=1}^{4}\frac{{s}_{RD,d}\cdot R{D}_{d}}{{s}_{RD}\cdot R{D}_{\mathit{max,d}}}$ where ${s}_{\mathit{max2}}$ is a scalar to calibrate. ${s}_{RD,d}$ is a scalar to calibrate representing the weight of each rooting depth or soil water storage capacity data set, and ${s}_{RD}$ is the sum of four ${s}_{RD,d}$ . $R{D}_{d}$ is each data set, and $R{D}_{\mathit{max,d}}$ is the maximum of each data set. Note that the same data sets used in Trautmann et al. (2022) are also used here. In this revised formulation, the spatial pattern of $wSoil{2}_{\mathit{max}}$ is extracted from the four data sets through the calibration of weights $\left({s}_{RD,d}\right)$ . Then, the spatial pattern is scaled using ${s}_{\mathit{max}}$ global parameter. The rooting depth data by Tian et al. (2019) (i.e., RD4 in Table 1) is only available for arid to moderately humid vegetated land. Therefore, a calibration parameter $wSoi{l}_{max(RD4)}$ is used to fill the gaps in the study area before scaling RD4. For each grid, products with empty or invalid values were excluded from the calculation. The total water holding capacity of soil layers, $wSoi{l}_{\mathit{max}}$ , is calculated as the sum of $wSoil{1}_{\mathit{max}}$ and $wSoil{2}_{\mathit{max}}$ .

Table 1 Data Sets Used for Meteorological Forcing and Parameter Calibration of SINDBAD

Variable	Product	Usage (temporal form)
Precipitation	GPCP 1dd v1.3 (Huffman et al., 2001)	Forcing (daily)
Air temperature	CRUJRA v2.2 (Harris, 2021)	Forcing (daily)
Net radiation	CERES SYN1degEd4A (Wielicki et al., 1996)	Forcing (daily)
TWS	GRACE Tellus JPL RL06M v1 with CRI v1 (Wiese et al., 2016)	C&E (monthly)
SWE	GlobSnow v3 (Luojus et al., 2021)	C&E (monthly)
ET	FLUXCOM RS Ensemble (Jung et al., 2019)	C&E (ESV)
Q	G-RUN Ensemble v1 (Ghiggi, Humphrey, Seneviratne, & Gudmundsson, 2021)	C&E (ESV)
GPP	FLUXCOM RS Ensemble (Jung et al., 2020)	C&E (ESV)
NEE	OCO-2 v10 MIP (Byrne et al., 2023)	C&E (monthly)
EVI	MODIS EVI of FluxnetEO (Walther et al., 2022)	Prescribed properties (smoothed)
RD1	Maximum rooting depth (Y. Fan et al., 2017)	Prescribed properties (static)
RD2	Effective rooting depth (Yang et al., 2016b)	Prescribed properties (static)
RD3	Maximum soil water storage capacity (Wang-Erlandsson et al., 2016)	Prescribed properties (static)
RD4	Maximum plant-available water capacity (Tian et al., 2019)	Prescribed properties (static)
SOC	WoSIS snapshot 2019 (Batjes et al., 2020)	Prescribed properties (static)

Transpiration (T) at time $t$ is first calculated as the minimum of supply (T_sup) and demand (T_dem) at time $t$ : 2 $T[t]=\min \left({T}_{sup}[t],{T}_{\mathit{dem}}[t]\right)$ where T_sup $[t]$ is calculated as the product of plant available soil water content (PAW) and the green fraction of absorbed photosynthetically available radiation (fAPAR), this is, the fAPAR absorbed by chlorophylls and therefore available for photosynthesis: 3 ${T}_{sup}[t]=PAW[t]\cdot fAPAR[t]$ PAW $[t]$ is calculated as a fixed fraction of the total amount of water contained in two soil layers available for root water uptake. 4 $PAW[t]=k\cdot (wSoil1[t]+wSoil2[t])$ where the $k$ parameter is calibrated, and wSoil1 $[t]$ and wSoil2 $[t]$ are the soil water content of the upper and lower layers at time $t$ , respectively. Note that the soil moisture of both layers is available for roots to transpire. To reduce the number of parameters to calibrate, fAPAR $[t]$ is derived by scaling enhanced vegetation index (EVI) like: 5 $fAPAR[t]={\beta }_{0}+{\beta }_{1}\cdot EVI[t]$ where the interception $\left({\beta }_{0}\right)$ and slope $\left({\beta }_{1}\right)$ are set as −0.09 and 0.8, respectively, based on the linear regression fit between MODIS EVI and green fAPAR from a 10,000 samples look-up table generated with the senSCOPE radiative transfer model (Pacheco-Labrador et al., 2021). senSCOPE improves the separation of green and total fAPAR in grid cells where green and senescent vegetation coexist, at the same time that it can represent purely green and senescent canopies. The evaporative demand by the atmosphere, T_dem $[t]$ , is calculated as the product of potential evapotranspiration (PET), the alpha parameter of the Pristley-Taylor equation $\left({\alpha }_{\mathit{veg}}\right)$ , and fAPAR like: 6 ${T}_{\mathit{dem}}[t]=PET[t]\cdot {\alpha }_{\mathit{veg}}\cdot fAPAR[t]$ where PET $[t]$ is provided as forcing data and the parameter ${\alpha }_{\mathit{veg}}$ is calibrated.

The formulation of transpiration as the minimum of supply and demand allows users to analyze the propagation of the effect of supply- and demand-limited conditions on transpiration to other water and carbon components in the model. The inclusion of fAPAR in Equations 3 and 6 accounts for the presence of vegetation and its functions in the grid cell. It is used as a scalar for the calculation of transpiration supply and demand to consider only the part of energy or water relevant for vegetation functions, similar to Trautmann et al. (2022).

Carbon Cycle

The carbon processes start with the calculation of GPP that is coupled with transpiration (T) through water use efficiency (WUE): 7 $GPP[t]=T[t]\cdot WUE[t]$ The effects of environmental conditions such as soil moisture and evaporative demand are inherited from transpiration. Note that GPP is simply calculated from transpiration as described in Equation 7. Since the performance of this approach was comparable to that of a light use efficiency model of GPP in which the potential GPP is scaled by the effects of light, cloud, air temperature, VPD, and soil moisture (not shown here). Thus, we chose the most parsimonious model. Moreover, the way of Equation 7 allows us to utilize the existing model structure of water cycle by Trautmann et al. (2022). The leaf-scale biochemical model of Farquhar, von Caemmerer, and Berry (Farquhar et al., 1980) was not considered to avoid the uncertainty during the upscaling procedure from the leaf scale to larger spatial scales (e.g., 1 ${}^{\circ}$ or regional scale).

To relate transpiration with vegetation productivity, water use efficiency is calculated based on the PRELES model (Peltoniemi, 2012). 8 $WUE[t]=WU{E}_{\mathit{ref}}\cdot \mathrm{exp}\left(\kappa \cdot -VP{D}_{\mathit{day}}[t]\right)\cdot fCO2[t]$ where $WU{E}_{\mathit{ref}}$ , the WUE at VPD of 1 hPa, and $\kappa$ are parameters to calibrate, and $VP{D}_{\mathit{day}}$ [ $t$ ] is daytime mean vapor pressure deficit provided as a forcing data. The modifier, $fCO2$ [ $t$ ], is calculated as following: 9 $fCO2[t]=1+\left({C}_{a}[t]-{C}_{a0}\right)/\left({C}_{a}[t]-{C}_{a0}+{C}_{m}\right)$ where ${C}_{a}$ [ $t$ ] is the ambient CO₂ concentration in ppm, ${C}_{a0}$ is the base level CO₂ concentration in ppm, and is set as 380 ppm. $Cm$ is a sensitivity parameter in ppm. The WUE formulations account for the effect of VPD and the atmospheric CO₂ concentration. This formulation enables a user to test the effect of increasing CO₂ concentration in a simulation over a long-term period by forcing the model with CO₂ observations. However, note that the effect of CO₂ via $fCO2$ [ $t$ ] does not vary with time in the current study, as the CO₂ concentration is fixed as 400 ppm for the whole simulation period. We use the fixed CO₂ concentration, assuming its trivial effect on the results due to the short time period of simulation (5 years).

The carbon loss through respiratory losses is calculated as the sum of autotrophic and heterotrophic respirations. A part of GPP is lost as autotrophic respiration (RA), remaining net primary productivity (NPP).

Autotrophic respiration depends on GPP, following Migliavacca et al. (2015), as 10 $RA[t]=\left({k}_{2}\cdot GPP[t]+{k}_{20}\cdot dEV{I}_{\mathit{smth}}[t]\right)\cdot f(Ta)[t]$ where ${k}_{2}$ is the degree of dependency of RA $[t]$ to GPP $[t]$ , ${k}_{20}$ is a scalar, $dEV{I}_{\mathit{smth}}$ [ $t$ ] is the change in smoothed EVI at the current timestep compared to the previous one. Note that, for the second term in parentheses, only positive $dEV{I}_{\mathit{smth}}$ [ $t$ ] values are considered, and negative $dEV{I}_{\mathit{smth}}$ [ $t$ ] values are set as zero, as the term is to account for the springtime physiological phenology effect on RA via enhanced growth rate (Migliavacca et al., 2015). The air temperature scalar, f(Ta)[ $t$ ], is calculated using an Arrhenius-type equation as following: 11 $f(Ta)[t]=\mathrm{exp}\left({E}_{0}\cdot \left(1/\left(T{a}_{\mathit{ref}}-T{a}_{0}\right)-1/\left(Ta[t]-T{a}_{0}\right)\right)\right)$ where ${E}_{0}$ (K) is the activation energy parameter and regulates the sensitivity of respiration to air temperature. $T{a}_{\mathit{ref}}$ is set as 288.15 K (15 ${}^{\circ}$ C) and $T{a}_{0}$ is set as 227.13 K (−46.02 ${}^{\circ}$ C).

NPP is added to the vegetation carbon pool (cVeg) as the only influx. cVeg in SINDBAD is a lumped representation of the vegetation carbon, and there is no allocation to different components such as root. cVeg loses carbon only via Litterfall.

Litterfall[t] is quantified by scaling the decrease in smoothed daily EVI as 12 $Litterfall[t]=\left\{\begin{array}{@{}ll@{}}r\cdot (EVI[t-1]-EVI[t]),\quad \hfill & \text{if}\,EVI[t-1]-EVI[t] > 0\hfill \\ 0,\quad \hfill & \text{otherwise}\hfill \end{array}\right.$ where $r$ is a scaling parameter to calibrate. The smoothed daily EVI was used, as spectral vegetation indices vary significantly in response to multiple factors such as leaf aging (Chavana-Bryant et al., 2017), while what we want to have from EVI is the phenological information of leaf senescence. Litterfall flows to the litter carbon pool (cLit), and the decay of cLit creates heterotrophic respiration (RH) from cLit (RHcLit) as 13 $cLit[t+1]=cLit[t]+Litterfall[t]-RHcLit[t]$

RHcLit is calculated as 14 $RHcLit[t]=cLit[t]\cdot {k}_{\mathit{base}}\cdot f(Ta)[t]\cdot f(wSoil)[t]$ where ${k}_{\mathit{base}}$ is the daily base decay rate of cLit, which is calculated as 15 ${k}_{\mathit{base}}=1-{e}^{-\tfrac{{k}_{\mathit{ann}}}{{N}_{\mathit{DPY}}}}$ where the calibration parameter ${k}_{\mathit{ann}}$ is the annual decay rate of cLit, and ${N}_{\mathit{DPY}}$ is the number of days in a year. ${k}_{\mathit{base}}$ is scaled by air temperature (Ta) and soil moisture (wSoil) stressors. f(Ta) $[t]$ is the same as used for RA (i.e., Equation 11), but with a separate sensitivity parameter ( ${E}_{01}$ for RA, and ${E}_{02}$ for RHcLit and RHcSoil). The different temperature sensitivity for RA and RH is supported by other studies (e.g., Rey et al., 2002; X. Wang et al., 2014). f(wSoil) is calculated adopting Equation 7 in Potter et al. (1993) (see Figure A1 for the response curve): 16 $f(wSoil)[t]=\frac{{(1+\mathrm{exp}(-10A))}^{2}}{\left(1+\mathrm{exp}\left(A\left({W}_{\mathit{opt}}-10-wSoil[t]/wSoi{l}_{\mathit{max}}\right)\right)\right)\cdot \left(1+\mathrm{exp}\left(A\left(-{W}_{\mathit{opt}}-10+wSoil[t]/wSoi{l}_{\mathit{max}}\right)\right)\right)}$ where $A$ is a calibrated parameter that determines the sensitivity of the curve, and ${W}_{\mathit{opt}}$ is a calibrated parameter and represents the optimum soil moisture content. Note that f(wSoil) $[t]$ is calculated separately for the left-hand side of ${W}_{\mathit{opt}}$ (i.e., wSoil[t]/ $wSoi{l}_{\mathit{max}}\ < \ {W}_{\mathit{opt}}$ ) and the right-hand side (i.e., wSoil[t]/ $wSoi{l}_{\mathit{max}}\ \ge \ {W}_{\mathit{opt}}$ ). Therefore, the two half curves have different sensitivity parameter values ( $A$ for the left-hand part and $B$ for the right-hand part). f(wSoil) $[t]$ equals to 1 when wSoil $[t]$ / $wSoi{l}_{\mathit{max}}$ equals to ${W}_{\mathit{opt}}$ , and decreases as wSoil[t]/ $wSoi{l}_{\mathit{max}}$ diverges from ${W}_{\mathit{opt}}$ . This equation considers not only the dry stress on respiration but also the wet stress. For RHcLit, wSoil of the upper layer (i.e., wSoil1) is used for the calculation, assuming all the litter carbon decomposition happens in the upper soil layer. RHcSoil is calculated considering the substrate availability and the effects of temperature and soil water content like: 17 $RHcSoil[t]=\left({\beta }_{2}+{\beta }_{3}\cdot SOC\right)\cdot f(Ta)[t]\cdot f(wSoil)[t]$ where two regression parameters ( ${\beta }_{2}$ and ${\beta }_{3}$ ) linearly scale soil organic carbon (SOC) to the base heterotrophic respiration of the soil carbon pool. The World Soil Information Service (WoSIS) snapshot 2019 (Batjes et al., 2020) is used for SOC. f(Ta) $[t]$ is the same as the one used for RHcLit with the same sensitivity parameter $\left({E}_{02}\right)$ . The formula of f(wSoil) $[t]$ is also the same as the one used for RHcLit, but both soil layers are used for the calculation.

The two respiration terms, RH from the litter carbon pool (RHcLit) and RH from the slow soil carbon pool (RHcSoil), constitute heterotrophic respiration: 18 $RH[t]=RHcLit[t]+RHcSoil[t]$ RHcLit is used to consider the effect of litterfall phenology on RH dynamics and RHcSoil to mimic the decay of slow soil carbon pool. In addition, both RHcLit and RHcSoil account for the delayed decomposition of soil carbon when the environmental conditions become favorable, for example, the pulse of respiration after rewetting of soil (Metz et al., 2023; Santos e Silva et al., 2024). Finally, ecosystem respiration is calculated as the sum of autotrophic and heterotrophic respiration, and then net ecosystem exchange is calculated as the difference between RECO and GPP. 19 $NEE[t]=RA[t]+RHcLit[t]+RHcSoil[t]-GPP[t]=RECO[t]-GPP[t]$ Note that our model does not account for the effect of fire emission.

Forcing and Observational Data

We used the same forcing variables used in Lee et al. (2023) (Table 1). These included precipitation from GPCP V1DD V1.3 (Huffman et al., 2001), air temperature from CRU JRA V2.2 (Harris, 2021), and net radiation from CERES SYN1deg(Ed4A) (Wielicki et al., 1996). Note that using different precipitation products or TWS products only showed a minor influence on the simulated TWS (Lee et al., 2023). VPD was precalculated using air temperature, specific humidity, and surface air pressure from the ERA5 data set (Muñoz Sabater, 2019).

The observation-based data used for constraining the model parameters include (a) TWS from the Gravity Recovery and Climate Experiment (GRACE) Mascon Ocean, Ice, and Hydrology Equivalent Water Height Release 06 Coastal Resolution Improvement (CRI) Filtered Version 1.0 (Wiese et al., 2016), (b) snow water equivalent (SWE) from the GlobSnow v3 (Luojus et al., 2021), (c) evapotranspiration from the FLUXCOM v1 RS ensemble (Jung et al., 2019), and (d) runoff from the Global RUNoff ENSEMBLE (G-RUN ENSEMBLE) v1 (Ghiggi, Humphrey, Seneviratne, & Gudmundsson, 2021). A spatial gap-filling that assigns zero values in non-snow regions was applied to GlobSnow SWE, following Kraft et al. (2022), to obtain global coverage. The observational constraints for carbon fluxes include FLUXCOM GPP (Jung et al., 2020) and OCO-2 v10 MIP NEE (Byrne et al., 2023). We assume that RECO is indirectly constrained by constraining GPP and NEE. The same constraints for the carbon fluxes are also used for the model evaluation.

External data sets were used as proxies of some variables related to land surface characteristics. Vegetation properties such as the vegetation index and maximum rooting depth were used following Trautmann et al. (2022). The vegetation index, quality-controlled and gap-filled daily MODIS EVI of FluxnetEO (Walther et al., 2022), was used to account for the vegetative control on water cycle processes, such as canopy interception, transpiration demand, and infiltration and runoff generation. It also accounts for the litterfall generation in the carbon cycle, which is used to estimate heterotrophic respiration from the litter carbon pool. The daily EVI was prescribed for the period of 2001–2019 after smoothing to remove noises. Four data products of maximum rooting depth and soil water capacity (Table 1) were used as a proxy of soil water holding capacity. Lastly, soil organic carbon by the World Soil Information Service (WoSIS) snapshot 2019 (Batjes et al., 2020) was used as a proxy of heterotrophic respiration from the soil carbon pool.

For all forcing and constraining variables, we applied a spatial filter as done in Lee et al. (2023) to exclude (a) grid cells with a significant fraction of ice, snow, water body, bare land surface, or artificial land cover or (b) grid cells with a significant anthropogenic impact on the trend in GRACE TWS by, for example, groundwater exploitation. This exclusion of grid cells was to preclude the potential biases during calibration due to processes that SINDBAD does not properly consider (Lee et al., 2023).

The model simulation was at the daily and 1 ${}^{\circ}$ by 1 ${}^{\circ}$ resolutions for the period of 2001–2019, while the calibration was done using monthly regional time series of SINDBAD and observational constraints for the period of 2015–2019 to obtain a set of optimal parameters each of which is globally constant (i.e., the same set of optimal parameters is shared among the whole grid cells). Thereby, forcing variables and constraints were aggregated into corresponding spatio-temporal resolutions. During calibration, GPP, ET, and Q time series were used in the form of an expected seasonal variability, as FLUXCOM RS and G-RUN ENSEMBLE report significant biases in reproducing interannual variability (Ghiggi, Humphrey, Seneviratne, & Gudmundsson, 2021; Jung et al., 2020).

Model Calibration

The purpose of the calibration is to find a set parameter value that best represents the global pattern of water and carbon cycles with respect to the constraints. This is achieved by iteratively comparing the model estimates with observational constraints to calculate the total cost (i.e., a measure of fit) and using a search algorithm to adjust parameter values to minimize the total cost. The resulting parameters in SINDBAD framework are spatio-temporally constant (i.e., one value for each parameter for a simulation). This is not only to reduce the computational cost for the optimization but also to capture the emerging patterns at a larger spatial scale as much as possible. It is assumed that the spatial dynamics of state and flux variables can be captured by data sets. We apply the multi-criteria approach, similar to Trautmann et al. (2018, 2022), to consider different aspects of earth observations simultaneously during the parameter calibration. We define a cost function and adjust the parameter values via the covariance matrix evolution strategy (CMA-ES) search algorithm (N. Hansen & Ostermeier, 2001) to minimize the total cost (i.e., the sum of cost components). CMA-ES is a stochastic global parameter search algorithm that is efficient and competitive for a range of optimization problems (N. Hansen et al., 2010). It has been deployed for various fields of study, including robotics (e.g., Hasselmann et al., 2021), hydrology (e.g., Rincón et al., 2023; Trautmann et al., 2018, 2022), and ecology (e.g., Van der Meersch & Chuine, 2023).

To reduce computational cost and reduce overfitting, parameter calibration is performed for a subset of grid cells and years: 904 1 ${}^{\circ}$ by 1 ${}^{\circ}$ grid cells (constituting only 8% of the total grid cells) and a 5-year subset (2015–2019) were used instead of the total grid cells (cf. 11,000 grid cells) and period (2001–2019). The grid cells were chosen via stratified sampling from the Köppen-Geiger climate regions (Kottek et al., 2006) to preserve the overall proportion among regions (Trautmann et al., 2022).

The calibration was done at the regional scale, as the native resolution of some constraints, such as OCO-2 NEE and GRACE TWS, is coarser than 1 ${}^{\circ}$ resolution. The TRANSCOM regions were used for the calibration. Specifically, the time series of sampled 904 grid cells by SINDBAD and constraints were aggregated into TRANSCOM regions using the land-area of each grid cell as the weight. The regional time series were linearly concatenated for each constraint (e.g., a concatenated regional time series of NEE, a concatenated regional time series of TWS, and so on), and the model-data mismatch of the constraint was quantified using Nash Sutcliffe Efficiency (NSE, Nash & Sutcliffe, 1970): 20 $Cos{t}_{c}=1-NSE=\frac{{\sum }_{t=1}^{N}{\left({m}_{t}-{o}_{t}\right)}^{2}}{{\sum }_{t=1}^{N}{\left({m}_{t}-\overline{o}\right)}^{2}}$ where $c$ is an observational constraint (i.e., the different observational data sets listed in Table 1), $t$ is a time step, $N$ is the number of time steps. ${m}_{t}$ and ${o}_{t}$ are the regional monthly model simulation and observational constraint, respectively. $\overline{o}$ is the mean of the monthly time series of the observational constraint, $o$ . Then, the costs of all constraints were added up to calculate the total cost (Equation 21), which the optimization algorithm used for adjusting parameter values: 21 $Cos{t}_{\mathit{total}}=\sum\limits _{c=1}^{N}Cos{t}_{c}$

Model Evaluation

Using the optimized parameter set, we ran a global model simulation including all 1 ${}^{\circ}$ by 1 ${}^{\circ}$ grid cells (c.a. 11,000 grid cells) in the study domain for the time period from 2001 to 2019. We first compared the simulations with corresponding observation-based products to evaluate the overall fitness of the model. This evaluation was conducted for the TRANSCOM regions, and at the global scale with the same temporal aggregation as calibration. Next, before analyzing their spatial variability, we further evaluated simulated carbon fluxes against constraints (FLUXCOM GPP and OCO-2 NEE) and TRENDY v9 dynamic global vegetation models. These models were used to estimate the global carbon budget for 2021 (Friedlingstein et al., 2022), and were introduced here as the current state-of-the-art models. The S2 experiment was used to exclude the effect of land cover change, which is not considered by SINDBAD. Two metrics were used for the evaluation of carbon fluxes: (a) the coefficient of determination (R²) as the square of the Pearson correlation coefficient to measure the temporal coherence and (b) the relative absolute error (RAE, Equation 22) to measure the bias. 22 $RAE=\frac{{\sum }_{t=1}^{N}\vert {m}_{t}-{o}_{t}\vert }{{\sum }_{t=1}^{N}\vert {o}_{t}\vert }$ where ${m}_{t}$ and ${o}_{t}$ are model simulation and observation, respectively, at time $t$ . $N$ is the number of time steps. The evaluation was conducted at the global scale and for each TRANSCOM region for which the model parameters were calibrated. Within each spatial domain, we compared the interannual variability of NEE and the seasonality of GPP and NEE. The interannual variability of a time series was calculated as follows: 23 $IA{V}_{i,mn}={X}_{i,mn}-fit\left({X}_{i,mn}\right)$ where i denotes a grid cell or a region, X is a CO₂ flux variable, $mn$ denotes a month of the year, and $fit()$ is the fitted values of the first-order polynomial linear regression model over a time series of a month across years (e.g., a linear regression fit applied to January values across multiple years, February values across multiple years, and so on; see Figure A2a and A2b for an illustration). We then calculated the 12-month running mean of the interannual variability to smooth noisy signals. The GPP was excluded from the evaluation of interannual variability, as FLUXCOM GPP is known to be limited at the time scale (Jung et al., 2020). For NEE IAV, we additionally evaluated SINDBAD with an independent inversion product, Jena CarboScope (Rödenbeck et al., 2018), which provides a longer time series from 2001 to 2019 compared to the OCO-2 product (2015–2019) used in calibration. As our model does not consider fire processes, which have significant direct (Piao et al., 2020) and indirect (Yin et al., 2020) contributions to the global and regional NEE IAV, we exclude fire-induced carbon emission from our analyses by removing it estimated by the Global Fire Emission Database (GFED) version 4 (Giglio et al., 2013) from modeled and observed NEE time series before calculating its IAV. We evaluate the effect of fire CO₂ emissions in Section 3.4.

We quantified the seasonality (i.e., expected seasonal variability, ESV) of a time series as follows: 24 $ES{V}_{i,mn}=mean\left({X}_{i,mn}-fit\left({X}_{i,mn}-mean\left({X}_{i,mn}\right)\right)\right)$ where $i$ denotes a grid cell or a region, $mn$ denotes a month of the year, and $fit()$ is the fitted values of the first-order polynomial linear regression model over a time series of a month across years, as in Equation 23. The ESV calculates the mean value of the month across years after removing the effect of linear trend. The mean value information of the month, ${X}_{i,mn}$ , was subtracted before removing the linear trend effect to preserve the magnitude of ${X}_{i,mn}$ . See Figure A2a and A2c for an illustration.

Note that the calculation of ESV is different from the mean seasonal cycle commonly used in previous studies, as the linear trend information is excluded. This was done to exclude the potential effect of the linear trend in the model evaluation and attribution analysis, as SINDBAD in this study was forced with a constant atmospheric CO₂ concentration; therefore it lacks the rising CO₂ effect on the land-atmosphere CO₂ fluxes. We compared ESV and the mean seasonal cycle, and the results presented here are almost identical (Figure A3).

To calculate the interannual variability or expected seasonal variability at the global scale, we averaged the IAV or ESV of regions using their area as the weight. For OCO-2 and TRENDY ensembles, the uncertainty across members was measured using the mean absolute deviation multiplied by 1.25, which corresponds to one standard deviation in a normal distribution (Raju & Srinivasan, 1996). This is more robust against outliers compared to standard deviation (Jung et al., 2011).

Analysis of CO₂ Flux Variability

Finally, we analyzed the spatial variability of global CO₂ fluxes (GPP, RECO, and NEE) arising from different grid cells and regions. Specifically, we applied the covariance matrix analysis by Lee et al. (2023) to quantify grid cell-wise and regional contributions to the variance in the globally aggregated temporal pattern, which is calculated as 25 ${f}_{i}=\frac{\sum\limits _{j=1}^{n}\mathrm{Cov}\left({X}_{i},\ {X}_{j}\right)}{\mathrm{Var}\left(\sum\limits _{i=1}^{n}{X}_{i}\right)}$ where ${f}_{i}$ is the relative contribution of grid cell or region $i$ , ${X}_{i}$ is CO₂ fluxes in the unit of carbon mass per time, or water volume of grid cell or region $i$ , and $\mathrm{Cov}\left({X}_{i},{X}_{j}\right)$ is the covariance of $X$ between grid cell or region $i$ and $j$ . The numerator is the sum of covariances between the grid cell or region $i$ with itself or with another one, which, in other words, is the sum of elements at the $i$ th column or row in the $n$ × $n$ covariance matrix. The denominator is the variance of the sum of CO₂ flux across grid cells or regions. In other words, the denominator represents the variance of globally aggregated $X$ , or the sum of all $n$ × $n$ covariance matrix elements calculated using all grid cells or regions, where $n$ is the number of land grid cells or regions.

The measure ${f}_{i}$ quantifies the total covariances related to the grid cell or region $i$ among the whole covariance combinations across grid cells or regions in a relative and normalized term. A positive contribution is assigned when the grid cell increases the global variance. This method can quantify the contribution of a group of grid cells, as ${f}_{i}$ is mathematically additive, and the sum of contributions of all grid cells becomes one. Note that ${f}_{i}$ is not a measure of model performance or the agreement between observation and simulation for the grid cell or region $i$ , as the measure also contains the covariance across grid cells or regions. Rather, the measure provides a view of spatial variability of global water and carbon cycles from each data product. Also note that the covariance matrix analysis quantifies the spatial contribution differently from the method used in Ahlström et al. (2015). Their method assesses the spatial contribution by summing the absolute values of the variable for each timestep, taking the sign into account relative to the globally aggregated time series. On the other hand, the covariance matrix analysis in this study quantifies the contribution of a spatial unit (e.g., region) to the variance of the globally aggregated time series considering the variance of the region and the covariance with other regions in accordance with mathematical laws of variance decomposition.

We calculated the contribution measure for different spatial (i.e., grid cell-wise and regional) and temporal scales (i.e., seasonal and interannual variability). SINDBAD results were used to quantify the grid cell-wise contribution. For the regional calculation, we used the Köppen-Geiger climate zones (Kottek et al., 2006), which are further simplified to fewer classes (see Figure A4), including Tropical humid region and Tropical savanna region (i.e., Tropical semi-arid region). We used the Köppen-Geiger zones and not the TRANSCOM regions because the climatic regions are more reflective of different climate and vegetation characteristics that allow for potential associations with different driving mechanisms of CO₂ fluxes. Note that the regional classification used affects the regional attribution results (Zhang et al., 2018) as well as the time scale and the time period.

Results and Discussion

We first evaluate SINDBAD performance against observation-based estimates with the same temporal aggregation as used in the calibration (Section 3.1). Next, we further evaluate the seasonal and interannual variability of GPP and NEE against observations as well as state-of-the-art TRENDY models (Section 3.2). Then, we analyze grid cell-wise and regional contributions to the global CO₂ fluxes (Section 3.3). Lastly, we discuss the potential limitations of our study (Section 3.4).

Overall Performance

Table 2 shows the list of parameters with the range and optimized value. These parameters are spatio-temporally constant, so are decided in terms of the global representativeness by the optimizer. Overall, the resulting parameter values are in broad agreement with other studies, albeit with differences due to the mismatch in scales.

Table 2 List of Parameters in SINDBAD

Parameter	Description	Unit	Range	Default	Optimized
Soil water capacity
${s}_{\mathit{max1}}$	the maximum soil water holding capacity of the first soil layer	$\mathrm{m}\mathrm{m}$	10–100	50	99.29
${s}_{\mathit{max2}}$	the scaling parameter to obtain the maximum soil water holding capacity of the second soil layer	$\mathrm{m}\mathrm{m}$	1,000–10,000	1,000	7151.63
${s}_{RD,1}$	weight for the maximum rooting depth by Y. Fan et al. (2017)	-	0–1	0.25	0.85
${s}_{RD,2}$	weight for the effective rooting depth by Yang et al. (2016b)	-	0–1	0.25	0.58
${s}_{RD,3}$	weight for the maximum soil water storage capacity by Wang-Erlandsson et al. (2016)	-	0–1	0.25	0.71
${s}_{RD,4}$	weight for the plant-available water capacity by Tian et al. (2019)	-	0–1	0.25	0.07
$wSoi{l}_{max(RD4)}$	maximum plant-available water capacity of the second soil layer for grid cells with missing estimates in Tian et al. (2019)	$\mathrm{m}\mathrm{m}$	0–1000	50	438.78
Transpiration
$k$	fraction of the second soil layer available for transpiration	-	0.01–0.1	0.05	0.09
${\alpha }_{\mathit{veg}}$	vegetation-specific alpha coefficient of the Priestley-Taylor equation	-	0.2–3	1	2.48
${\beta }_{0}$	interception of linear regression to estimate fAPAR from EVI	-	−0.09–0.09	−0.09	−0.09
${\beta }_{1}$	slope of linear regression to estimate fAPAR from EVI	-	0.8–0.8	0.8	0.8
Water use efficiency
$WU{E}_{\mathit{ref}}$	WUE at 1 hPa of VPD	$\mathrm{g}\mathrm{C}\mathrm{m}\mathrm{m}{\mathrm{H}}_{2}{\mathrm{O}}^{-1}$	0.1–20	9.2	4.70
$\kappa$	scalar for the exponential response of WUE to VPD	$\mathrm{k}\mathrm{P}{\mathrm{a}}^{-1}$	0.06–0.7	0.4	0.25
${C}_{a0}$	the base level CO₂ concentration for the CO₂ concentration effect on WUE	$\mathrm{p}\mathrm{p}\mathrm{m}$	300–500	380	403.64
${C}_{m}$	the sensitivity scalar for the CO₂ concentration effect on WUE	$\mathrm{p}\mathrm{p}\mathrm{m}$	10–2,000	500	1938.10
Autotrophic respiration
${k}_{2}$	the degree of dependency of RA to GPP	-	0.01–0.99	0.5	0.42
${k}_{20}$	the degree of dependency of RA to physiological phenology	-	0.01–1500	300	96.87
${E}_{01}$	sensitivity of RA to temperature	$\mathrm{K}$	150–500	300	298.65
$T{a}_{\mathit{ref}}$	reference temperature for RA	${}^{\circ}$ C	15–15	15	15
$T{a}_{0}$	lower reference temperature for RA	${}^{\circ}$ C	−46.02–46.02	−46.02	−46.02
Litterfall generation
$r$	scalar to derive litterfall from EVI change	$\mathrm{g}\mathrm{C}{\mathrm{m}}^{-2}\mathrm{d}\mathrm{a}{\mathrm{y}}^{-1}$	300–3,000	1,500	575.81
Heterotrophic respiration
${k}_{\mathit{ann}}$	annual decay rate of cLit	$\mathrm{y}\mathrm{e}\mathrm{a}{\mathrm{r}}^{-1}$	0.5–148	14.8	15.79
${E}_{02}$	sensitivity of RH to temperature	$\mathrm{K}$	1–500	100	71.33
$A$	sensitivity of RH to soil moisture within the soil moisture range lower than ${W}_{\mathit{opt}}$	-	0.01–0.1	0.05	0.05
$B$	sensitivity of RH to soil moisture within the soil moisture range higher than ${W}_{\mathit{opt}}$	-	0.01–1	0.5	0.79
${W}_{\mathit{opt}}$	optimal soil water content for RH	%	70–95	90	87.83
${\beta }_{2}$	interception of linear regression to estimate RHcSoil from SOC	$\mathrm{g}\mathrm{C}{\mathrm{m}}^{-2}\mathrm{d}\mathrm{a}{\mathrm{y}}^{-1}$	0.01–2.5	1	0.22
${\beta }_{3}$	slope of linear regression to estimate RHcSoil from SOC	$1{0}^{-3}\mathrm{d}\mathrm{a}{\mathrm{y}}^{-1}$	0.375–0.625	0.5	0.41

The maximum water holding capacity of the second soil layer $\left(wSoil{2}_{\mathit{max}}\right)$ , which is estimated as a scaling factor $\left({s}_{\mathit{max2}}\right)$ times the harmonized spatial pattern of two rooting depths and two soil water capacity data sets (Equation 1), shows a similar spatial pattern with a higher maximum range (Figure A5), compared to the previous results (Figure B16 in Lee et al., 2023) and an independent product by Stocker et al. (2023). The optimal moisture content for heterotrophic respiration $\left({W}_{\mathit{opt}}\right)$ of 87.83% is within the range of values reported for heterotrophic respiration from various soil texture classes in other modeling studies (e.g., Moyano et al., 2012; Yan et al., 2018). Compared to Yan et al. (2018), our ${W}_{\mathit{opt}}$ is comparable with that of sandy loam and sandy clay loam, which prevail in South American and African tropical and semi-arid regions (e.g., Hengl et al., 2017; Ross et al., 2018).

The larger value of ${E}_{01}$ than ${E}_{02}$ means a stronger sensitivity of autotrophic respiration to temperature variability compared to heterotrophic respiration. However, whether one component has a stronger sensitivity to temperature changes than the other is uncertain, especially at the global scale, as warming experiments usually do not consider above-ground autotrophic respiration and represent only a few species or biomes. X. Wang et al. (2014) analyzed the results of warming experiments across the globe. They found a stronger response of heterotrophic respiration to warming than root respiration, but the effect of drier conditions caused by warming also affected their results. Site or plot-level studies yield varying conclusions. Hartley et al. (2007) conducted a warming experiment in plots of wheat and maize. They found that soil respiration is more sensitive to warming of soil than root respiration is. Schindlbacher et al. (2009) analyzed changes in respiration with a 2-year soil warming experiment in a forest dominated by Norway Spruce. They concluded that microbial respiration increased a bit more than but comparable to root respiration, which showed a strong coupling with the photosynthesis rate of plants. Collectively, there need to be more observations with longer periods to account for multiple determinants of the temperature sensitivity of respiration, such as soil moisture, nutrient, and vegetation function (Rey et al., 2002; Melillo et al., 2011; X. Wang et al., 2014). Nevertheless, our parameter calibration results based on multiple observational products indicate a stronger temperature sensitivity of RA than RH at the global scale and provide a perspective of a model-data integration framework.

We evaluate the overall performance of SINDBAD which is run using the above set of optimized parameter values against observational products used for parameter calibration across TRANSCOM regions and at the global scale. Overall, SINDBAD GPP, NEE, TWS, ET, SWE, and Q broadly agree with corresponding observation-based products–FLUXCOM GPP, OCO-2 NEE, GRACE TWS, FLUXCOM ET, GlobSnow SWE, and G-RUN Q–across the globe and TRANSCOM regions (Figure 3). The wide range of grid cell-wise performance shows the within-region heterogeneity of water and carbon cycles that are not necessarily captured by parameters that are calibrated against regional means. The grid cell-wise errors can also be attributable to inconsistencies of constraints and meteorological forcing as they are not produced in a single framework. Note that the performance for the grid cells in the study area is very similar to that of the 904 grid cells used for the parameter calibration (Figure A6). This means that the spatial and temporal subsets used for the calibration well represent the global pattern of water and carbon fluxes. We also observe that, for regions with low variability and/or magnitude of the signal, the model performance becomes worse at the regional scale than the grid cell scale, for example, ET and GPP in the South American Tropical region (Figure 3d) and GPP in Australia (Figure 3k). See Table A1 for the evaluation of seasonal and interannual variability.

[IMAGE OMITTED. SEE PDF]

Temporal Variations of CO₂ Fluxes

Figure 4 compares GPP from SINDBAD and TRENDY models against FLUXCOM observations over the TRANSCOM regions. SINDBAD captures the expected seasonal variability of GPP well, especially at the global scale and across boreal and temperate regions. It explains more than 95% of the variance in GPP ESV for most regions except those without clear seasonality, such as the South American Tropical region (Figure 4d) and Australia (Figure 4k). In such regions, SINDBAD may not perform as well as it does in regions with stronger signals, as the calibration was conducted simultaneously for all regions. SINDBAD also performs well in terms of bias, with RAE ${< }$ 0.35 for most regions. For the Eurasian Temperate region (Figure 4i), where SINDBAD captures the temporal pattern well, SINDBAD underestimates GPP across the year (RAE = 0.29). Compared with the TRENDY model ensemble, SINDBAD shows comparable or better performance $\left({R}^{2}\right)$ in reproducing the observed temporal pattern, underscoring the value of parameter calibration even within a subset of spatial and temporal domains. Additionally, it shows improved seasonality compared to the ensemble median of TRENDY models, especially in the second half of the year in the temperate and boreal regions.

[IMAGE OMITTED. SEE PDF]

For NEE ESV, SINDBAD agrees with OCO-2 MIP well at the global scale and they are in broad agreement across the TRANSCOM regions (Figure 5). When compared to the ensemble median of TRENDY models, SINDBAD shows slightly better ${R}^{2}$ and RAE at the global scale, while the comparison results vary by regions. Specifically, SINDBAD generally shows better performance for regions with relatively stronger variability, such as North American Temperate (Figure 5c), Eurasian Boreal (Figure 5h), and Europe (Figure 5l) regions; whereas the ensemble median of TRENDY models perform better for regions with relatively weaker variability such as Eurasian Temperate (Figure 5i), Tropical Asia (Figure 5j), and Australia (Figure 5k).

[IMAGE OMITTED. SEE PDF]

While SINDBAD NEE seasonal variability shows good agreements with OCO-2 across global and regional scales, it also shows notable biases. Specifically, in the North American temperate region (Figure 5c), SINDBAD overestimates NEE ESV in the summer. One possible reason is the effect of the management in croplands. The region includes an intense agricultural area (i.e., the U.S. Corn Belt), which causes a high GPP in the peak growing season (e.g., Guanter et al., 2014). These effects in croplands are not considered by both the FLUXCOM GPP (Jung et al., 2020) and the model; therefore SINDBAD underestimates the peak GPP in the region. The ensemble median of TRENDY models similarly underestimate GPP at the growing season peak, suggesting the lack of relevant processes to account for croplands in the current state-of-the-art.

In the South American temperate (Figure 5e) and Southern Africa (Figure 5g) regions, SINDBAD does not capture the decrease in OCO-2 NEE, which starts around October, whereas TRENDY reasonably simulates the decrease in NEE ESV. The timing of the decrease in NEE ESV is 1–2 months after the start of the increase in GPP ESV, which SINDBAD successfully captures (Figures 4e and 4g). This indicates that the bias in SINDBAD NEE ESV arises from respiration processes, such as respiration pulse with the rewetting at the start of the growing season (i.e., the Birch effect), which is an influential water-carbon interaction to the CO₂ dynamics, particularly over drylands (e.g., Barnard et al., 2020; Z. Fan et al., 2015; Metz et al., 2023). Like some TRENDY models that are able to capture the respiratory pulse depict (a component of) RH sensitive to upper soil moisture more than GPP is (Metz et al., 2023), SINDBAD also uses the upper soil moisture to calculate the moisture response of RHcLit. Nevertheless, SINDBAD does not capture the Birch effect. Possible reasons include (a) the less sensitive moisture response function and/or (b) the relatively thicker topsoil in SINDBAD, which has the maximum water holding capacity $\left({s}_{\mathit{max1}}\right)$ of 99.29 mm. The TRENDY models that capture the Birch effect have a relatively thinner topsoil. For example, the JSBACH model has the topsoil of 65 mm thickness (Reick et al., 2021) and the CLASSIC model has the topsoil of 100 mm thickness (Melton et al., 2020).

Regarding interannual variability (IAV), SINDBAD NEE IAV reasonably agrees with the constraint at global and regional scales (Figure 6). SINDBAD shows the best performance at the global scale ( ${R}^{2}$ = 0.69 and RAE = 0.62), while the performance metrics vary by region ( ${R}^{2}$ = 0.05–0.67 and RAE = 0.63–3.46). When compared with the ensemble median of TRENDY models, SINDBAD performs comparably at the global scale, but the comparison results vary by region. As in NEE ESV, SINDBAD performs better in regions with stronger IAV (e.g., Tropical Asia), except for Australia (Figure 6k), where SINDBAD does not capture the strongest land carbon sink in 2016 and 2017, which seems to be dominant for the global signal during the period, estimated by OCO-2 and TRENDY models. In 2016, Australia's higher rainfall and lower air temperature enhanced land carbon uptake by vegetation in arid areas, while the region released more CO₂ in 2019 due to drier and hotter conditions (Villalobos et al., 2022). SINDBAD shows a smaller magnitude of variability compared to OCO-2 and/or TRENDY, for example, in South American Tropical (Figure 6d), South American Temperate (Figure 6e), and Southern Africa (Figure 6g) regions. On the other hand, SINDBAD shows a comparable magnitude of the signal as estimated by OCO-2 in boreal and temperate regions with relatively weak variability, such as North American Boreal (Figure 6b), North American temperate (Figure 6c), Eurasian Boreal (Figure 6h), and Eurasian Temperate (Figure 6i) regions, though ${R}^{2}$ is low and RAE is high due to low magnitude and variability.

[IMAGE OMITTED. SEE PDF]

SINDBAD shows weaker performance in time periods outside the parameter calibration when compared against IAV from an independent NEE estimate, while the ensemble median of TRENDY models are less affected by the inversion product to be compared with. In the evaluation against Jena CarboScope (Rödenbeck et al., 2018), the ensemble median of TRENDY models perform better than SINDBAD in simulating the global signal and in a number of TRANSCOM regions (Figure 7). The decrease in SINDBAD's performance is particularly notable in Tropical Asia and Australia, where SINDBAD underestimates the magnitude of NEE IAV compared to OCO-2 estimates. Such a degradation of SINDBAD performance compared with another NEE product seems to come from the fact that SINDBAD is a diagnostic model, and that cannot get the large fluctuation in Jena CarboScope NEE IAV. The dependency of SINDBAD on the NEE product used for parameter optimization or the difference between OCO-2 and Jena CarboScope is unlikely the reason for the performance decrease, as another optimization experiment using Jena CarboScope instead of OCO-2 causes only minor differences in NEE estimates (Figure A7). This means that the difference in raw time series of Jena CarboScope and OCO-2 is not significant for the parameter calibration of SINDBAD; the clear difference in their NEE IAV, for example, in 2015–2016 (Figure A8), is not influential for the calibration as its raw temporal form, which is used for the parameter calibration, has much stronger magnitude than its IAV form. The independence of SINDBAD from the NEE constraint during parameter calibration suggests that improving the simulation requires addressing other factors. These may include redefining the cost function to better emphasize interannual variability or enhancing the meteorological forcing data, which appear to impact the simulation of interannual variability in global carbon fluxes (e.g., Jung et al., 2020; Nelson et al., 2024).

[IMAGE OMITTED. SEE PDF]

Overall, SINDBAD reproduces the ESV of GPP and NEE and NEE IAV estimated by reference products used in the parameter calibration, and it performs comparable or better than the ensemble median of TRENDY models. On the other hand, we also find discrepancies in several regions and weaker performance of SINDBAD when evaluated over a longer time period than the one used for parameter calibration. This feature, together with the wide range of grid cell-wise performance and the tendency that SINDBAD matches toward regions with stronger flux signals, can be expected given the way how it is calibrated (i.e., for all regions simultaneously for 5 years). Nevertheless, the reasonable consistency across scales in the CO₂ fluxes among SINDBAD, TRENDY models, and observation-based products shows the potential of SINDBAD and its estimates as valuable resources for diagnosing the global carbon cycle dynamics.

Spatial Attribution of the Global CO₂ Fluxes

We attribute the variance of globally integrated seasonal and interannual variability of global CO₂ fluxes to each grid cell to show the spatial pattern of the contributions (Figure 8). Given the varying performance of SINBAD at the grid-cell level (Figure 3), we only interpret rather clear patterns in Figure 8. We expect that these clear patterns are rather robust, as they are qualitatively consistent with results of the regional attribution analysis (see below). We provide a similar result using the method by Ahlström et al. (2015) for a comparison (Figure A13). For ESV, GPP and RECO have positive contributions over the Northern Hemisphere, and they clearly show a contrasting pattern between the Southern and Northern Hemispheres. However, the inter-hemispheric compensation of GPP and RECO respectively diminishes in the NEE ESV map, leading to a dominating role of the Northern high latitudes for the seasonal cycle of global NEE. In addition to the inter-hemispheric compensation of GPP and RECO, the weakened positive and negative contributions around the equator and in the Southern Hemisphere can also be partly attributable to the compensatory water effect on GPP and RECO (Jung et al., 2017) where the simultaneous increase or decrease of GPP and RECO in response to the moisture condition dampens the NEE variability. Regarding IAV, South America emerges as the dominant contributor for the three carbon fluxes, but within South America, NEE spatial distribution differs from that of RECO and GPP. Large contributions to RECO and GPP IAV are found within the Amazon River basin and semi-arid areas in Eastern Brazil, while contributions to NEE IAV consist of a mixture of wet and dry tropics and semi-arid ecosystems. Besides South America, other remarkable hotspots include the Eastern USA, Eurasia, and sub-Saharan regions.

[IMAGE OMITTED. SEE PDF]

The Northern mid-to-high latitudes positively contribute to both NEE ESV and NEE IAV, implying that GPP and RECO in the region are occasionally decoupled. This region stores a larger amount of soil organic carbon. As SINDBAD estimates the heterotrophic respiration from the soil carbon pool using a map of soil organic carbon, the contribution of heterotrophic respiration from the slow soil carbon pool (RHcSoil) to RECO becomes larger in the Northern high latitudes. The higher RHcSoil contribution to RECO weakens the GPP-RECO coupling strength in the model, thereby allowing the region to contribute more to the variance in the global NEE. The emergence of the Northern mid-to-high latitudes for the NEE ESV and NEE IAV contributions may be SINDBAD-specific, and indeed happens much weaker in the ensemble median of TRENDY models (Figure A9). This occurs because, in SINDBAD, as RA depends on GPP, which leads to a stronger GPP-RECO coupling, and RHcSoil depends on SOC. However, a number of empirical evidence indicates that the GPP-RECO coupling occurs across different biomes (Migliavacca et al., 2011), which supports the rationale of the dependency of RA on GPP in SINDBAD. Also, the relatively large soil heterotrophic respiration fluxes in Northern mid-to-high latitudes are shown in global maps of heterotrophic respiration reported by other studies (e.g., Nissan et al., 2023). This suggests that, in addition to different responses of RECO and GPP to transient temperature and moisture conditions, soil carbon pool dynamics and its effect on the GPP-RECO (de)coupling contribute to the globally emergent NEE IAV.

We further investigated the contributions at a regional scale using Köppen-Geiger regions to provide a better summary of the spatial pattern of contributions. We provide the model evaluation across Köppen-Geiger regions (Figures A10–A12), similar to Figures 4–6, for the direct comparison of model performance and spatial attribution results. We also provide comparable results of regional contributions quantified using the method by Ahlström et al. (2015) for a comparison (Figures A14 and A15 for ESV and IAV, respectively). The varying SINDBAD performance across regions, especially for the IAV, may affect the attribution. However, the attribution results below are qualitatively consistent between ones using observational constraints and ones using SINDBAD (and TRENDY models), which supports the robustness of the results. We observe the consistency in estimates of regional contributions to the variance in the ESV of the global CO₂ fluxes among SINDBAD, constraints, and TRENDY models (Figure 9). They agree that the global ESV of the three CO₂ fluxes is driven by temperate and boreal regions with a wider spread across TRENDY models, especially in the Boreal region. SINDBAD shows higher contributions by the temperate region for all fluxes compared to constraints and TRENDY models, while it relatively underestimates the contribution by the boreal region. This underestimation is probably caused by the underestimated magnitude of GPP ESV in the boreal region by SINDBAD (Figure A10). While all products consistently show marginal contributions by tropical and arid regions, SINDBAD and TRENDY models quantify slightly higher contributions by the tropical humid region to NEE ESV compared to OCO-2, with a relatively wide range across TRENDY models. We highlight that although TRENDY models, SINDBAD, and OCO-2 agree with the general pattern of regional contributions, the reasons behind are different by regions and data sets. For example, in tropical savanna regions, TRENDY models show strong inter-hemispheric compensation, while SINDBAD and OCO-2 do not (Figure A17). The pattern is different in the temperate regions, where the contribution is dominated by the northern part, identified by TRENDY models, SINDBAD, and OCO-2.

[IMAGE OMITTED. SEE PDF]

The large temperate and boreal regions' contribution to global GPP ESV agrees with other studies (e.g., Chen et al., 2017); NEE ESV contribution results are also consistent with estimates by other top-down inversion frameworks (e.g., Krishnapriya et al., 2022) and by another model-data integration framework (e.g., Quetin et al., 2020). Such a domination of temperate and boreal regions for GPP and RECO ESV is caused by spatial cancellation around the equator (e.g., Chen et al., 2017). We observe strong positive and negative grid cell-wise (Figures 8a and 8b) and regional (Figure A17a and A17b) contributions to GPP and RECO ESV, while only marginal contributions remain in tropical humid and savanna regions (Figures 9a and 9b). On the other hand, the strong contribution by temperate and boreal regions for NEE ESV is the result of the temporal cancellation between carbon uptake and release by lands, specifically the compensatory water effect on GPP and RECO (Jung et al., 2017) that seems to occur more strongly in subtropical regions. The Northern mid-to-high latitudes drive the global NEE ESV (Figure 9). The seasonality of water availability in the Northern mid-to-high latitudes is strongly controlled by snow (Trautmann et al., 2018), and snow also influences the seasonality of carbon fluxes in these regions (Arndt et al., 2020; Yi et al., 2020). Collectively, we speculate that snow may be a key regulator of the global NEE ESV and its coupling to water availability.

For interannual variability of CO₂ fluxes, compared to the ESV results, we observe different regions dominating the global variability (Figure 10; see Figure A18 for the same results with the separation of Northern and Southern Hemispheres within each region). For GPP and RECO IAV, SINDBAD estimates the largest contribution from the tropical humid region, while TRENDY models show qualitatively comparable contributions for all regions without a dominant contributor. The regional contributions to the global GPP and RECO IAV are quantified differently not only between SINDBAD and TRENDY models in this study but also from other studies. For example, Chen et al. (2017) quantified regional contributions to global GPP IAV estimates by various approaches, including an ensemble of process-based models, satellite observations, and a data-driven upscaled product, and identified the most contribution consistently among those products in South Africa and Oceania, where SINDBAD identifies only mild-to-weak positive contributions (Figure 8) though it should be noted that there are differences in the methodology between two studies: (a) the target time period (2001–2015 vs. 1971–2010), (b) different ways to calculate IAV, and (c) different ways to calculate regional contributions.

[IMAGE OMITTED. SEE PDF]

For NEE IAV, SINDBAD, TRENDY models, and OCO-2 converge to the finding that the tropical humid region is dominating the global variance. Notably, both OCO-2 and TRENDY show a wide range of estimates across ensemble members. The contribution of the tropical humid region by SINDBAD is much smaller than that of GPP and RECO IAV, probably due to the strong GPP-RECO coupling within the model in this region. The tropical savanna region appears to be the second dominant region, supported by OCO-2 and TRENDY ensemble median. However, SINDBAD quantifies a slightly larger contribution of the temperate region than the tropical savanna, and TRENDY ensemble members show a wide range for the temperate region. For the arid region, OCO-2 shows a comparable contribution to the tropical savanna region, while SINDBAD and TRENDY models show smaller contributions. Both OCO-2 and TRENDY models show a marginal contribution to the boreal region. Most OCO-2 members and their ensemble median show negative contributions, while SINDBAD and most TRENDY models show positive contributions.

On the other hand, regional contributions to NEE IAV quantified using the method by Ahlström et al. (2015) show smaller contributions by tropical regions and larger contributions by arid regions (Figure A15c), compared with our method (Figure 10c). In both methods, tropical humid regions still show significant contributions to the global NEE IAV compared to tropical savanna regions. We note that results using the heuristic index by Ahlström et al. (2015) needs to be interpreted with caution as the method does not account for the effect of covariance in space, which takes a significant role in land-atmosphere CO₂ fluxes (Jung et al., 2017). For example, temperate and boreal regions have negative spatial covariance of NEE IAV (Figure A16), implying a potential overestimation of contributions from these regions without considering the covariance (e.g., Figure A15c). The qualitatively distinct results between the two methods underscore the importance of the chosen methodology in spatial attribution studies, highlighting the need for caution in their deployment and further investigation into the methodology.

The emergence of the tropical humid region as a significant mediator of the interannual variability of global land carbon sink is remarkable since semi-arid regions have been regarded so far as the most relevant actors (e.g., Ahlström et al., 2015; Poulter et al., 2014; Zhang et al., 2018). There are studies suggesting that semi-arid regions may not be the only dominant contributor to the global land carbon sink variability. Piao et al. (2020) compared the net land carbon flux estimated by land carbon models, global atmospheric inversions, and FLUXCOM and found that tropical semi-arid and non-semi-arid regions show comparable contributions to the IAV of net land carbon flux. Levine et al. (2023) used a model-data fusion framework and found the dominant role of tropical humid regions on the IAV of tropical net biosphere production (NBP = −NEE–fire) for the 21st century due to water stress that is more attributable to the atmospheric demand than the supply. The importance of humid regions on the IAV of carbon fluxes is also found at the national scale. H. Li et al. (2021) analyzed simulations of an ensemble of terrestrial biosphere models and found the largest contributions (62%) of humid regions to the IAV of net primary productivity from 1982 to 2018 in China. Finally, although we find a strong agreement among SINDBAD, TRENDY models, and OCO-2 on the significant role of the tropical humid region to the global NEE IAV, we also observe a large uncertainty in the region by both ensemble products, which remains significantly large in another attribution method (Figure A15c). Such a wide range of contributions by humid tropical regions also happened in a comparison of GPP IAV among process models, satellite remote sensing observations, and data-driven upscaled product (e.g., Chen et al., 2017), due to fewer observations (e.g., Pastorello et al., 2020), low quality and amount of satellite data due to cloud contamination (Jung et al., 2020), and less understood carbon cycle processes, such as leaf phenology (e.g., Pau et al., 2011; Piao et al., 2019) and vegetation response to environmental conditions (Restrepo-Coupe et al., 2017).

Limitations

The relative importance of each region derived from our regional attribution analysis may be affected by processes that SINDBAD does not simulate, including wildland fires, deforestation, and agriculture. In our analyses, fire C emissions have been excluded, and this may cause the underestimation of contribution from regions with frequent wildfires, such as semi-arid regions. Also, the spatial filtering to get the study domain does not account for heavily deforested or cultivated grid cells. To test the robustness of the conclusions against fire CO₂ emissions, deforestation, or cultivation, we conducted the regional attribution analysis of the global CO₂ fluxes with (a) including fire CO₂ emissions, (b) excluding deforested grid cells, or 3) removing cultivated grid cells (see Appendix B for the details). In all these three cases, our conclusions remain consistent as before (Figures B1–B6). The consistent regional attribution results by SINDBAD, regardless of heavily deforested or cultivated grid cells, can probably be due to two facts: (a) the subset of grid cells used for parameter optimization includes some of the heavily deforested or cultivated grid cells (Figure B9), and (b) the smoothed daily MODIS EVI prescribed to SINDBAD contains this information for example, Biradar and Xiao (2011) and L. Li et al. (2014).

We note that there are uncertain parts that our additional test cannot account for due to current limitations of the state of the art. For example, the effect of wildland fire is tested by using (a) SINDBAD NEE plus GFED fire C emissions, (b) raw OCO-2 NEE without excluding fire C emissions, and (c) TRENDY net biome exchange estimates from the S3 experiment with land use change and fire emissions. SINDBAD NEE plus GFED fire C emissions can consider the direct effect of fire (i.e., CO₂ emission) only, leaving the indirect effect, such as post-fire tree mortality, untested. OCO-2 relies on GFED fire C emissions to identify fire effects (Byrne et al., 2023); GFED fire emissions have their own uncertainty, such as estimating emission factors and fuel consumption (van der Werf et al., 2017). The fire modules in TRENDY models have their uncertain parts, such as representing peatland fires and forest mortality processes (Piao et al., 2020).

Finally, a part of uncertainty of the regional attribution analysis of the global IAV of CO₂ fluxes can be from the way of parameter calibration. SINDBAD parameters are calibrated for all the TRANSCOM land regions simultaneously using the raw or ESV of constraints. As IAV is a weak signal compared to the raw time series or ESV, the optimized parameters may be less representative for the IAV, although SINDBAD shows a comparably good performance in general in simulating the IAV of CO₂ fluxes at the globe and across regions.

Conclusions

We developed a parsimonious process-based model of the land water and carbon cycles, SINDBAD. The parameters were constrained using multiple observation-based products, including GRACE TWS, OCO-2 NEE, and FLUXCOM GPP. SINDBAD, with the optimized set of parameters, reproduced the observed water and carbon patterns reasonably. The simplicity in parameter calibration (e.g., a subset of grid cells and years for parameter calibration and the use of spatiotemporally constant parameters for simulation) showed the significance of model-data integration.

SINDBAD performed comparable to or better than the ensemble of the state-of-the-art dynamic global vegetation models (i.e., TRENDY v9) in simulating seasonal and interannual variability of carbon fluxes at regional and global scales. Despite remaining limitations in the model, the reasonable performance of the model shows its potential in diagnosing patterns of water and carbon cycles.

Using the optimized model as well as TRENDY models and observation-based products, we assessed the relative contributions of land grid cells and climate regions to seasonal and interannual variability of land-atmosphere CO₂ fluxes, including GPP, RECO, and NEE. SINDBAD, TRENDY models, and observation-based constraints agreed with the significant role of temperate and boreal regions in the Northern Hemisphere in the seasonal variability of carbon fluxes, which resulted from spatial (GPP and RECO) and temporal (NEE) cancellation. For interannual variability of NEE, all three products quantified a large contribution in tropical humid regions, though TRENDY models and OCO-2 showed large spreads among ensemble members, showing the need for better understandings and more observations of water and carbon cycles in the regions. For the IAV of GPP and RECO, no distinct contributor appeared in the results of either SINDBAD or TRENDY models, but we also lacked a reliable observation-based product to validate the model simulations.

Overall, our study provides an understanding of spatial contributions to the global land-atmosphere CO₂ variability at seasonal and interannual scales by leveraging the synergy of process-based models and observation-based products to reduce parameter uncertainties. The significance of tropical humid regions in the global NEE IAV underscores the need to revisit the previous focus on semi-arid regions. Furthermore, our study reveals significant disparities among both TRENDY models and among OCO-2 ensemble members in estimating the covariation of land-atmosphere CO₂ in tropical humid regions. Limited knowledge and modeling skill of carbon processes in tropical humid regions may not only impede the accurate diagnosis of global NEE IAV hotspots but also hinder a comprehensive understanding of the coupling between water and carbon cycles.

Appendix

Appendix A - Additional Table and Figures

Additional table and figures in Appendix A provide further details on some aspects of the study, including regional classification (Figure A4), model formulation (Figure A1), calculations of IAV and ESV (Figures A2 and A3), model evaluation (Table A1, Figures A5–A8 and A10–A12), and spatial attribution analyses (Figures A9 and A13–A18).

Table A1 Coefficient of Determination $\left({R}^{2}\right)$ and Relative Absolute Error (RAE) Between SINDBAD Estimates and Assimilated Constraints Across TRANSCOM Land Regions

	Expected seasonal variability ${R}^{2}$ (RAE)	Interannual variability ${R}^{2}$ (RAE)
GPP	NEE	TWS	ET	SWE	Q	NEE	TWS	SWE
Global	0.99 (0.04)	0.99 (0.15)	0.80 (0.45)	0.99 (0.11)	0.99 (0.17)	0.76 (0.15)	0.69 (0.62)	0.55 (0.71)	0.56 (0.85)
1: North American Boreal	0.99 (0.14)	0.98 (0.35)	0.79 (0.41)	0.98 (0.19)	0.99 (0.13)	0.53 (0.45)	0.13 (1.02)	0.12 (0.91)	0.69 (0.59)
2: North American Temperate	0.99 (0.09)	0.92 (0.30)	0.82 (0.44)	0.98 (0.10)	0.97 (0.26)	0.83 (0.32)	0.44 (1.82)	0.58 (0.73)	0.85 (0.39)
3: South American Tropical	0.01 (0.17)	0.33 (1.88)	0.99 (0.36)	0.00 (0.07)	- (−)	0.82 (0.13)	0.32 (0.95)	0.67 (0.62)	- (−)
4: South American Temperate	0.99 (0.13)	0.71 (0.75)	0.88 (0.39)	0.96 (0.26)	- (−)	0.97 (0.47)	0.13 (0.91)	0.33 (0.81)	- (−)
5: Northern Africa	0.95 (0.16)	0.84 (0.41)	0.95 (0.32)	0.99 (0.21)	0.37 (0.91)	0.93 (0.35)	0.41 (0.71)	0.17 (0.93)	0.04 (0.94)
6: Southern Africa	1.0 (0.12)	0.68 (0.56)	0.91 (0.29)	1.00 (0.30)	- (−)	0.89 (0.31)	0.28 (0.84)	0.03 (1.01)	- (−)
7: Eurasian Boreal	1.0 (0.05)	0.93 (0.24)	0.8 (0.44)	0.97 (0.17)	0.97 (0.19)	0.75 (0.32)	0.22 (0.9)	0.34 (0.80)	0.61 (0.8)
8: Eurasian Temperate	0.99 (0.29)	0.69 (0.88)	0.89 (0.33)	0.99 (0.20)	0.99 (0.19)	0.79 (0.96)	0.07 (3.46)	0.39 (0.81)	0.80 (0.59)
9: Tropical Asia	0.95 (0.05)	0.27 (0.85)	0.98 (0.37)	0.97 (0.07)	- (−)	0.89 (0.22)	0.64 (0.63)	0.38 (0.81)	- (−)
10: Australia	0.19 (0.35)	0.14 (0.96)	0.93 (0.58)	0.95 (0.15)	- (−)	0.83 (1.90)	0.67 (0.79)	0.46 (0.88)	- (−)
11: Europe	0.99 (0.11)	0.97 (0.16)	0.89 (0.40)	0.98 (0.13)	0.99 (0.23)	0.70 (0.33)	0.05 (2.27)	0.00 (1.10)	0.66 (0.79)

[IMAGE OMITTED. SEE PDF]

Appendix

Appendix B - Effects of Fire, Deforestation, and Agriculture

We tested the robustness of results of regional attribution analyses (Figures 9 and 10) against fire CO₂ emissions, deforestation, and cultivation by additionally conducting the regional attribution analysis with (a) including fire CO₂ emissions estimates, (b) excluding heavily deforested grid cells, and (c) excluding heavily cultivated grid cells. In summary, whether or not containing these emissions or grid cells does not affect our conclusions (Figure B1, Figure B2, Figure B3, Figure B4, Figure B5, Figure B6).

[IMAGE OMITTED. SEE PDF]

For the fire CO₂ emissions, we added fire CO₂ emissions from the Global Fire Emission Database (GFED) version 4 (Giglio et al., 2013) to SINDBAD NEE estimates. Note that this can account for the direct effect of fire emissions only, while it can not test the indirect effects of fire on the carbon cycle such as the post-fire tree mortality. For OCO-2, we used its raw estimates, instead of subtracting GFED fire CO₂ emissions. For TRENDY, we used the net biome productivity (NBP) estimates from the S3 experiment that account for emissions from land use change and fire.

For heavily deforested or cultivated grid cells, we used corresponding gridded masks to filter out these grid cells before aggregating into regional CO₂ flux time series and quantifying the regional contributions. For the mask of deforested grid cells, we removed heavily deforested grid cells using the global forest cover fraction map by M. C. Hansen et al. (2013). We calculated the forest cover fraction change between 2019 and 2001 (i.e., the study period), and defined grid cells as “heavily deforested” if the fraction change is smaller than 10 percentile (i.e., the top 10% strong negative changes) (Figure B7). Regarding the mask of cultivated grid cells, we used the MODIS Global Land Cover product MCD12Q1 version 6 (Friedl & Sulla-Menashe, 2019) to filter out heavily cultivated grid cells from the regional attribution analysis. We defined the heavily cultivated grid cells as grid cells with more than 50% fraction of cropland (Figure B8). Note that the subset of grid cells used for parameter optimization includes some of the heavily deforested or heavily cultivated grid cells (Figure B9).

[IMAGE OMITTED. SEE PDF]

Acknowledgments

Hoontaek Lee acknowledges support from the Max Planck Institute for Biogeochemistry (MPI-BGC) and the International Max Planck Research School for Global Biogeochemical Cycles (IMPRS-gBGC). Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Part of the funding for this study was provided through NASA Carbon Cycle Science (Grant NNH20ZDA001N-CARBON). Javier Pacheco-Labrador was supported by the ESA Living Planet Fellowship IRS4BEF “Integrated Remote Sensing for Biodiversity-Ecosystem Function” (C.N.4000140028/22/I-DT-lr), a program of and funded by the European Space Agency. We thank Dr. Sophia Walther at MPI-BGC for the preparation of the vegetation index data used for the SINDBAD simulation. We thank the Associate Editor and three reviewers for their careful reading and constructive comments on the manuscript. Open Access funding enabled and organized by Projekt DEAL.

Data Availability Statement

SINDBAD simulation results are available at (Lee et al., 2025a). The scripts for processing data and producing figures can be accessed via a public repository on GitHub: (Lee et al., 2025b). TRENDY v9 simulation results are available upon request (Global Carbon Project, 2021). GPCP 1dd v1.3 precipitation is available at (Adler et al., 2020). CRUJRA v2.2 is available at (Harris, 2021). CERES SYN1degEd4A is available at (NASA/LARC/SD/ASDC, 2017). GRACE terrestrial water storage anomalies are available at (Wiese et al., 2023). GlobSnow v3 product is available at (Luojus et al., 2020). G-RUN Ensemble v1 is freely available at (Ghiggi, Humphrey, Gudmundsson, & Seneviratne, 2021). FLUXCOM fluxes (Jung et al., 2019, 2020) were obtained from (FluxCom, n.d.). OCO-2 v10 MIP fluxes (Byrne et al., 2023) are publicly available from (Byrne et al., 2022). Jena CarboScope fluxes (Rödenbeck et al., 2018) are available at (Rödenbeck, n.d.). MODIS EVI of FluxnetEO is freely available at (Walther et al., 2021). The maximum rooting depth by Y. Fan et al. (2017) is available upon request. The effective rooting depth by Yang et al. (2016b) is freely available from the CSIRO Data Access Portal (Yang et al., 2016a, ). The maximum soil water storage capacity by Wang-Erlandsson et al. (2016) is publicly available. The maximum plant-available water capacity by (Tian et al., 2019) is available upon request. The WoSIS snapshot 2019 is freely available at (Batjes et al., 2019). GFED v4 burned area is available at (Randerson et al., 2017). The Hansen global forest cover map (M. C. Hansen et al., 2013) was obtained from (Global Forest Change, n.d.). MODIS Global Land Cover product MCD12Q1 v6 will be decommissioned soon, but v6.1 is available at (Friedl & Sulla-Menashe, 2019).

References

Adler, R., Wang, J.‐J., Sapiano, M., Huffman, G., Bolvin, D., Nelkin, E., & NOAA CDR Program. (2020). Global precipitation climatology project (GPCP) climate data record (CDR), version 1.3 (daily) [Dataset]. Boulder CO: Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. [DOI: https://dx.doi.org/10.5065/ZGJD-9B02]

Word count: 13053

Show less

© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The spatial contribution to the global land‐atmosphere carbon dioxide (CO₂) exchange is crucial in understanding and projecting the global carbon cycle, yet different studies diverge on the dominant regions. Informing land models with observational data is a promising way to reduce the parameter and structural uncertainties and advance our understanding. Here, we develop a parsimonious diagnostic process‐based model of land carbon cycles, constraining parameters with observation‐based products. We compare CO₂ flux estimates from our model with observational constraints and Trends in Net Land‐Atmosphere Carbon Exchange (TRENDY) model ensemble to show that our model reasonably reproduces the seasonality of net ecosystem exchange (NEE) and gross primary productivity (GPP) and interannual variability (IAV) of NEE. Finally, we use the developed model, TRENDY models, and observational constraints to attribute variability in global NEE and GPP to regional variability. The attribution analysis confirms the dominance of Northern temperate and boreal regions in the seasonality of CO₂ fluxes. Regarding NEE IAV, we identify a significant contribution from tropical savanna regions as previously perceived. Furthermore, we highlight that tropical humid regions are also identified as at least equally relevant contributors as semi‐arid regions. At the same time, the largest uncertainty among ensemble members of NEE constraint and TRENDY models in the tropical humid regions underscore the necessity of better process understanding and more observations in these regions. Overall, our study identifies tropical humid regions as key regions for global land‐atmosphere CO₂ exchanges and the inter‐model spread of its modeling.

Details

Title

Spatial Attribution of Temporal Variability in Global Land‐Atmosphere CO2 Exchange Using a Model‐Data Integration Framework

Author

Lee, H.¹

; Jung, M.²

; Carvalhais, N.³

; Reichstein, M.⁴

; Forkel, M.⁵

; Bloom, A. A.⁶

; Pacheco‐Labrador, J.⁷; Koirala, S.²

¹ Max Planck Institute for Biogeochemistry, Jena, Germany, Technische Universität Dresden, Institute of Photogrammetry and Remote Sensing, Dresden, Germany
² Max Planck Institute for Biogeochemistry, Jena, Germany
³ Max Planck Institute for Biogeochemistry, Jena, Germany, Departamento de Ciências e Engenharia do Ambiente, Faculdade de Ciências e Tecnologia, Universidade Nova Lisboa, Costa da Caparica, Portugal, ELLIS Unit Jena, Michael Stifel Center Jena for Data‐Driven and Simulation Science, Jena, Germany
⁴ Max Planck Institute for Biogeochemistry, Jena, Germany, ELLIS Unit Jena, Michael Stifel Center Jena for Data‐Driven and Simulation Science, Jena, Germany
⁵ Technische Universität Dresden, Institute of Photogrammetry and Remote Sensing, Dresden, Germany
⁶ Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA
⁷ Environmental Remote Sensing and Spectroscopy Laboratory (SpecLab), Spanish National Research Council (CSIC), Madrid, Spain

Section

Research Article

Publication year

2025

Publication date

Mar 1, 2025

Publisher

John Wiley & Sons, Inc.

e-ISSN

19422466

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1029/2024MS004479

ProQuest document ID

3181722409

Spatial Attribution of Temporal Variability in Global Land‐Atmosphere CO2 Exchange Using a Model‐Data Integration Framework

Jump to:

Full text

Abstract

Details

Suggested sources