Introduction
The global carbon cycle is an important biogeochemical cycle, which affects the climate on Earth (Schimel, 2001). Terrestrial vegetation, which covers a large part of the land area, assimilates atmospheric carbon dioxide (CO2) through photosynthesis. Simultaneously, CO2 of similar magnitude is released into the atmosphere through terrestrial ecosystem respiration (TER). The net balance of these two fluxes determines if terrestrial ecosystem acts as a sink or source of carbon (Ruehr et al., 2023). Terrestrial gross primary production (GPP) can be defined as “apparent” photosynthesis, that is, the rate at which the vegetation assimilates carbon through photosynthesis minus the loss of carbon only through photorespiration (Plummer, 2006; Wohlfahrt & Gu, 2015). GPP can be estimated directly using gas exchange measurements at the leaf and canopy scales (Jez et al., 2021), and indirectly through measurements of net ecosystem exchange (NEE) using the eddy covariance (EC) method at the ecosystem or landscape scale (D. D. Baldocchi, 2003). Though the GPP estimated using the EC method represents “apparent” photosynthesis, its magnitude can be closer to “true” photosynthesis which is the actual amount of carbon assimilated due to overestimation of daytime mitochondrial respiration in flux–partitioning algorithm (Reichstein et al., 2005; Wohlfahrt & Gu, 2015). Furthermore, a large variety of biogeochemical models have been developed to simulate and upscale carbon fluxes from local to regional or global scales to better describe the global carbon cycle (Burton et al., 2023; Dannenberg et al., 2023; Nelson et al., 2024; Xiao et al., 2014).
Biogeochemical models that simulate GPP can be of different types and complexities. On the one hand, process-based models, such as the models used in the Trends in Net Land-Atmosphere Carbon Exchange (TRENDY) project, mechanistically describe the physiological processes involved in photosynthesis or plant respiration (Sitch et al., 2015). Their ability to capture a certain process largely depends on the underlying model structure and calibration of model parameters (Anav et al., 2015). Similar, but simpler than fully mechanistic approaches are the models constructed on the concept of light use efficiency (LUE), which treat a canopy usually as one big leaf, but where the GPP is calculated as the product of the absorbed photosynthetically active radiation (aPAR) and LUE (Monteith, 1972). These models are semi-empirical as they combine both the simplicity of empirical models and the theoretical mechanisms that underpin process-based models (J. Chen, 2021; Running et al., 2000; Yuan et al., 2007). On the other hand, data–driven empirical models (Jung et al., 2011, 2020) are largely based on learning regression functions to establish a general relation between input data, such as meteorology and ecosystem properties, and the desired output, such as GPP. These models' performance largely relies on good quality training data and generally lacks comprehensive representations of long-term forcing functions.
Considering the methodological diversity and differences in GPP estimates, various model benchmarking and model–data integration experiments have been designed to compare approaches and to unveil drivers of ecosystem functioning for various climate–vegetation types, across spatial and temporal scales. A long-standing challenge, and still a key area of interest, lies in understanding the factors controlling inter–annual variability (IAV) of the various carbon fluxes (D. Baldocchi et al., 2018). The challenge presents itself from the mechanistic to the more data–driven approaches and contests the dominant role of meteorology in determining the IAV of ecosystem fluxes (Richardson et al., 2007). At the local ecosystem level, J. Wu et al. (2012) looked at the IAV of net ecosystem fluxes by fitting the parameters of a semi-empirical model at shorter timescales to capture the seasonality, but also annual variability of model parameters. The approach allows testing the role of changes in ecosystem functioning in the IAV of carbon fluxes (Richardson et al., 2007). They concluded that climate and parametric variability control IAV of ecosystem fluxes at shorter and longer timescales, respectively.
Simultaneously, Fatichi and Ivanov (2014) highlighted the role of climate when using 200 years of hourly synthetic meteorological data to force an ecohydrological model to find that the random occurrence of favorable weather conditions at certain hours of the day can be a major predictor of IAV of net primary production (NPP). This statistical relationship was corroborated by Zscheischler et al. (2016) using actual flux data from EC sites from forested areas in North America, where the 91st percentile values of hourly GPP flux, that is, peak GPP values, substantially contributed to the IAV of GPP flux. These studies highlight the correlation between the distribution tails and the IAV in EC fluxes. However, there is no robust pattern across sites nor do they challenge there is no variability in ecosystem function.
More recently, a model selection study compared an ensemble of 5600 possible semi-empirical LUE model structures to find a global best model structure (Bao, Wutzler, et al., 2022). The best LUE model was calibrated at a daily timescale per site and simulated GPP fluxes across the FLUXNET EC tower sites (Pastorello et al., 2020), considering the effect of various environmental conditions on maximum LUE through partial sensitivity functions. Though the best global model performed similarly to the best model selected for each site at the daily resolution, it failed to represent the variability of annually aggregated GPP fluxes for 74% of sites, that is, the Nash-Sutcliffe efficiency (NSE) of model performance (Nash & Sutcliffe, 1970) was below or equal to 0.5. This may be attributed to (a) the use of daily data in the study, as the model had no information on the favorable conditions that occurred in a diurnal cycle and failed to simulate the diurnal GPP peaks which had a major influence on IAV (Bao, Wutzler, et al., 2022; Fatichi & Ivanov, 2014; Zscheischler et al., 2016), (b) the assumption of invariance in ecosystem function, that is, values of model parameters remain constant for all site–years in a site, and (c) the need to explicitly consider different timescales in the cost function (Desai, 2010).
In contrast, Mengoli et al. (2022) proposed an optimality-based framework (Stocker et al., 2020; Wang et al., 2017), that is, optimality-based P-model which simulates GPP following optimality principle and differentiates between instantaneous and acclimated photosynthetic responses. This model demonstrated its capability in simulating half-hourly GPP dynamics at 10 EC sites, covering four vegetation classes for limited time periods. Whereas, the performance of this modeling framework across sites representing diverse climate–vegetation features and various temporal resolutions were not evaluated. Though this modeling framework considers the effect of temperature, vapor pressure deficit (VPD), atmospheric CO2 concentration, solar radiation, and the fraction of absorbed photosynthetically active radiation (fAPAR), it does not explicitly consider the effect of drought stress on GPP variability at sub-daily scale. However, numerous studies (Anderegg et al., 2015; Assal et al., 2016; Kannenberg et al., 2019; Reichstein et al., 2013; C. Wu & Wang, 2022) in the past highlighted the effect of climatic extremes, such as drought on the reduction of GPP through observations and simulations, as well as ecosystems can temporarily act as a carbon sink during drought (Ciais et al., 2005). Moreover, various studies (Müller & Bahn, 2022; Orth et al., 2020; Seneviratne et al., 2012; X. Yu et al., 2022) also found that the drought can also impose a prolonged legacy effect on seasonal and annual dynamics of GPP and recovery days can vary between ecosystems (Z. Yu et al., 2017). Recently, Mengoli et al. (2023) proposed an improved version of this model by incorporating climatic aridity and calculating a scaling factor for GPP. However, in the improved model, the scaling factor could only be applied to improve the simulation of daily GPP.
The challenge to correctly reproduce IAV is also apparent on a global scale as Anav et al. (2015) found a high level of disagreement in annual GPP estimates from diverse global GPP modeling frameworks. These discrepancies highlight that site-level limitations in simulating IAV propagate to larger scales where additional mechanisms play a role, such as natural or anthropogenic disturbances and land–use land cover change (Bultan et al., 2022; McGuire et al., 2001).
As highlighted above, the major persistent drawbacks in most of the past studies were the limited implementation of either model structures or parameterization approaches and the evaluation of models for limited sites and timescales. To address this, here we explore ecosystem-level estimations of GPP flux to systematically investigate how various factors can be linked to describing the IAV of GPP flux, such as peak values of diurnal GPP, climatic conditions, and variables represented by model parameters, which are usually hard to measure directly and can be difficult to interpret even when various modeling approaches are adopted. We tested the impact of the constant or time–varying parameterizations and evaluated their performances in capturing GPP variability at various temporal aggregation scales, especially at the annual scale. We also tested the hypothesis that observational constraints complement and enhance theoretically grounded process formulations and that improving the model simulations at the sub-daily scale improves the prediction of IAV of GPP. Additional analysis on parameter inversion approaches and cost functions, as well as on parametric variability are treated in a companion paper (De, Brenning, et al., 2025). In this study, we aim to answer:
-
How well does an optimality-based model perform compared to a semi-empirical model across various temporal scales with different model parameterization approaches?
-
What factors influence the variability of model performance at different temporal scales?
-
Can the performance of an optimality-based model be improved if drought stress is included?
-
How much are the differences in model performance between an optimality-based and a semi-empirical model related to variations across plant–functional types (PFT) and climate–vegetation types?
-
Does improved simulations of peak diurnal GPP lead to improved simulations of IAV of GPP?
Methods and Data
In this study, we focused on parameterization of both a semi-empirical model, at daily and sub-daily scales, and an optimality-based model at a sub-daily scale using various parameterization strategies consisting of different subsets of data and cost functions (Figure 1). Thereafter, we performed forward runs of models with calibrated parameters at the temporal resolution of model parameterization data and evaluated model performances at different temporal aggregations (Figure 1). The following sections describe each methodological step in a detailed manner.
[IMAGE OMITTED. SEE PDF]
Models
Optimality-Based Model: P-Model of Mengoli
Stocker et al. (2020) proposed the first version of the P-model based on theories formulated by Wang et al. (2017), which unified the classic Farquhar–von Caemmerer–Berry model (Farquhar et al., 1980) with the simplified formation of big leaf LUE models (Monteith, 1972). The underlying equations of the P-model were formulated based on the optimality principle (Prentice et al., 2014) and the coordination principle (J.-L. Chen et al., 1993; Maire et al., 2012). According to the optimality principle, plants aim to optimize the cost of transpiring water to assimilate CO2 through the stomata. In the P-model, the ratio of leaf internal and ambient CO2 concentration () is calculated for which the above-described cost is minimal, and the sensitivity () of to VPD is predicted. The coordination principle describes the achievement of equilibrium between the maximum rate of carboxylation () and electron transport () by the plants.
Mengoli et al. (2022) adapted the first version of the P-model to simulate half-hourly GPP dynamics. Here, we applied this same model at an hourly scale and called it the model. The major improvement in this version was defining an explicit differentiation between instantaneous (such as RuBisCo and light-limited carbon assimilation), and photosynthetic responses (, , and ) which acclimate over time in response to environmental conditions. One of the important aspects of this model is that the parameters associated with cellular biochemistry acclimate to favorable conditions during the day over a period of time or acclimation time (). In this study, we considered the favorable condition as the average of three hourly input data points in the middle of the day from 11:00 (hh:mm) LT, 12:00 (hh:mm) LT, and 13:00 (hh:mm) LT. A rolling mean of the average condition from mid-day was taken over the , which was used to calculate optimal values of the model parameters, as described in Mengoli et al. (2022). The value of was calibrated as a parameter in our study (Table 1). We chose the mid-day and rolling mean approach from Mengoli et al. (2022) as it produced the best results in their evaluations of the model at the half-hourly scale.
Table 1 Description, Range, Initial Values, and Units of Calibrated Model Parameters
| fX/model name | Symbol | Definition | Initial value | Lower bound | Upper bound | Units | Reference |
| , models | Length of acclimation time | 18 | 1 | 100 | Days | after Mengoli et al. (2022) | |
| , models | Maximum light use efficiency | 0.04 | 0 | 0.13 | Skillman (2008) | ||
| (, models) | Optimal temperature | 10 | 5 | 35 | C | Bao, Wutzler, et al. (2022) | |
| Sensitivity to temperature changes | 2 | 1 | 20 | Bao, Wutzler, et al. (2022) | |||
| Lag parameter for temperature effect | 0.29 | 0 | 0.9 | – | Bao, Wutzler, et al. (2022) | ||
| (, models) | Sensitivity to VPD changes | − | −0.01 | − | Bao, Wutzler, et al. (2022) | ||
| Sensitivity to atmospheric CO2 concentration changes | 0.4 | 0 | 10 | – | Bao, Wutzler, et al. (2022) | ||
| Minimum optimal atmospheric CO2 concentration | 380 | 340 | 390 | ppm | Bao, Wutzler, et al. (2022) | ||
| CO2 fertilization intensity indicator | 2000 | 100 | 4,000 | ppm | Bao, Wutzler, et al. (2022) | ||
| (, models) | Light saturation curvature indicator | 0 | 0.05 | Bao, Wutzler, et al. (2022) | |||
| (, models) | Sensitivity to cloudiness index changes | 0.5 | 1 | – | Bao, Wutzler, et al. (2022) | ||
| (, , , models) | Optimal soil moisture | 0.26 | 0.01 | 0.99 | Bao, Wutzler, et al. (2022) | ||
| Sensitivity to soil moisture changes | −11 | −5 | −30 | – | Bao, Wutzler, et al. (2022) | ||
| Lag parameter for soil moisture effect | 0.98 | 0 | 1 | – | Bao, Wutzler, et al. (2022) | ||
| WAI (, , , models) | Available water capacity | 100 | 1 | 1,000 | mm | Bao, Wutzler, et al. (2022) | |
| Rate of evapotranspiration | 0.05 | 0.1 | Bao, Wutzler, et al. (2022) | ||||
| Multiplicative scalar for potential evapotranspiration | 1.2 | 0 | 5 | – | Trautmann et al. (2018) | ||
| Snow melt rate for temperature | 0.125 | 0 | 0.5 | Trautmann et al. (2018) | |||
| Snow melt rate for net radiation | 0.0375 | 0 | 0.125 | Trautmann et al. (2018) | |||
| Sublimation resistance | 0.44 | 0 | 3 | – | Bao, Wutzler, et al. (2022) |
One of the known limitations of the model is its tendency to overestimate GPP fluxes in water–limited ecosystems, as no explicit representation of soil moisture conditions was included (Mengoli et al., 2022, 2023). In order to relax such drawbacks here we used the water availability index () as a proxy of soil moisture (Bao, Wutzler, et al., 2022; Boese et al., 2019; Tramontana et al., 2016). The represents the spatial and temporal dynamics in plant available water based on a simple hydrological model where storage is controlled by precipitation and evapotranspiration. We further introduced a drought stress function that additionally scaled the GPP estimates of the model, and we denoted this new version as the model. We calibrated 10 parameters in the model in which nine parameters were in the hydrological model and the drought stress function (Table 1). Further details on the implementation of the model, along with the drought stress function can be found in Text S1 and S2, Figures S1 and S2 of Supporting Information S1.
Semi-Empirical Model: Bao Model
Vegetation stores energy from absorbed solar radiation in the form of biochemical energy through photosynthesis. The efficiency of the photosynthetic apparatus in performing this energy conversion is termed as light use efficiency (). In a LUE model, GPP is calculated as the product of instantaneous , photosynthetic photon flux density (), and the fraction of incident photosynthetically active radiation that is absorbed by vegetation (). Instantaneous reaches its maximum, that is, , when all environmental factors are optimal for photosynthesis. Instantaneous is determined as the product between and the partial sensitivity functions () for the different environmental factors controlling GPP, such as air temperature (), , available soil water supply (), absorbed photosynthetic photon flux (), the cloudiness index (, Table A1), and atmospheric CO2 concentration (Bao, Wutzler, et al., 2022; Horn & Schulz, 2011; Mäkelä et al., 2008).
In this study, we used the LUE model of Bao, Wutzler, et al. (2022), Bao et al. (2023) since it emerged as a robust representation from the systematic comparison across the large diversity of LUE formulations in the literature. The model selection followed a Bayesian approach that leveraged on the evaluation of modeling performance across FLUXNET EC sites (Pastorello et al., 2020) when forced and calibrated with daily data for each site. We denoted this model as the model when we parameterized at hourly scale, and as the model when we parameterized at daily scale. The model is described in Equations 1–8, where , , , , and are partial sensitivity functions for , , , , and , respectively. In this case, and were calculated similar to the implementation in model, that is, with a simple hydrological model (Text S1 and Figure S1 of Supporting Information S1) and drought stress function (Equations 7 and 8 are same as Equations S1 and S2 of Supporting Information S1), respectively. Bold terms in the Equations 1–8 are model parameters, and their initial values, units, and ranges are described in Table 1. The physical ranges for most of the parameters were based on Bao, Wutzler, et al. (2022), Bao et al. (2023), and Trautmann et al. (2018). The term, viz. Equation 4, also accounts for atmospheric CO2 concentration. The partial sensitivity functions range from zero to one (except the 2nd part of Equation 4 which can be greater than one), where a value of zero completely diminishes, and of one completely favors GPP. In this study, we changed the denominator of Equation 2 in comparison to the original exponential function of Bao, Wutzler, et al. (2022), Bao et al. (2023), as the revised version produced a more realistic range of (Figure S3 of Supporting Information S1). Sensitivity functions and also consider a lag effect of and . The lag effect of temperature was considered for Temperate, Boreal, and Polar regions where the first letter of the Köppen–Geiger (KG) climate class is “C,” “D,” “E,” and that of soil water supply was considered for arid regions where the first letter of the KG climate class is “B” (Beck et al., 2018; Rubel et al., 2017).
Data Used
We selected 198 eddy covariance sites, for which the required forcing and observation or derived data for model parameterization were available from the FLUXNET2015 data set (Pastorello et al., 2020; FLUXNET.org, 2024a). A list of these sites is provided in Table S8 of Supporting Information S1 and their spatial distributions are plotted in Figure S4 of Supporting Information S1. The variables which were used to force, and parameterize models as well as data processing steps such as gap–filling, and quality control are described in detail in Table A1, Appendix B and Appendix C. We prepared these data in both hourly and daily resolutions.
In our study, a total of 13 different PFTs (as defined in FLUXNET.org, 2024b) were represented: croplands (CRO; 19 sites), deciduous broadleaf forests (DBF; 25 sites), deciduous needle leaf forest (DNF; one site), evergreen broadleaf forests (EBF; 13 sites), evergreen needle leaf forests (ENF; 47 sites), grasslands (GRA; 35 sites), mixed forests (MF; nine sites), closed shrublands (CSH; three sites), open shrublands (OSH; 13 sites), savannas (SAV; six sites), permanent wetlands (WET; 20 sites), woody savannas (WSA; six sites), and land cover under snow for most of the year (SNO; one site). The major KG climate classes (Rubel et al., 2017; Beck et al., 2018; FLUXNET.org, 2024c) are represented by 12 tropical sites, 18 arid sites, 87 temperate sites, 71 boreal sites, and 10 polar sites. We also classified sites into nine climate–vegetation types, similar to Bao, Wutzler, et al. (2022), in which seven sites are tropical forests (TropicalF), five sites are tropical grassland (TropicalG), six sites are arid forest (AridF), 12 sites are arid grassland (AridG), 51 sites are temperate forest (TemperateF), 36 sites are temperate grassland (TemperateG), 52 sites are boreal forest (BorealF), 19 sites are boreal grassland (BorealG), and 10 sites have polar vegetation.
Model Parameterization
We primarily defined four different parameterization strategies consisting of various subsets of data to calibrate the model parameters controlling hourly GPP dynamics. These parameterization strategies were used to determine a vector of calibrated parameter values (a) for each site–year, (b) for each site, (c) for each PFT, and (d) for all sites at once (global parameterization). We also performed another parameterization per site using a modified cost function which used an additional constraint on the IAV of GPP (). We chose various parameterization strategies ranging from more flexible to rigid to evaluate how good or bad the model performs with changing parameterization. We parameterized and forced the model, and the model using hourly and daily data, respectively to perform a comparative analysis (Table 2). The model and the model were only parameterized and forced using hourly data (Table 2).
Table 2 Description of Models and Tasks Accomplished With Each Specific Model
| Models | Description | Parameterization strategies | ||||
| Per site–year | Per site using | Per site | Per PFT | Global | ||
| P-model of Mengoli et al. (2022) parameterized using hourly data | a, d | a, d | a, d | a, d | a, d | |
| P-model of Mengoli et al. (2022) with an additional constraint on drought stress and parameterized using hourly data | a, b, d, e | a, b, d, e | a, b, d, e | a, b, d, e | a, b, d, e | |
| LUE model of Bao, Wutzler, et al. (2022) parameterized using hourly data | a, c, d, e | a, c, d, e | a, c, d, e | a, c, d, e | a, c, d, e | |
| LUE model of Bao, Wutzler, et al. (2022) parameterized using daily data | a, c, d | a, c, d | a, c, d | a, c, d | a, c, d | |
We used Python (Python Core Team, 2021) implementation (pycma v3.3.0.1) of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) (Hansen & Kern, 2004; Hansen et al., 2019) as our global search algorithm to find the values of model parameters for which cost function reached its minimum. This is a derivative-free, evolutionary algorithm, which is designed to find global minima in a rugged parameter space.
A robust cost function is a necessity for the numerical optimizer to find the global minimum. The cost functions for , models () and the , models () were calculated as Equations 9 and 10, respectively, in case of per site–year and per–site parameterization. Here, is either a site or site–year based on parameterization type. For PFT-specific model parameterization, the cost functions were and for , models and , models, respectively. denotes a site and denotes the total number of sites in a specific PFT. In the case of global model parameterization, the cost functions were and for the model and , models, respectively. denotes a site and denotes the total number of sites used in this study.
and were calculated (Equation 11) as a weighted normalized NSE, viz. NNSE (Hundecha & Merz, 2012) between the time series of good quality data points (see Appendix B for the selection criteria) of EC derived and simulated GPP and ET, respectively. The GPP and ET derived from EC measurements are denoted as and , respectively. The simulated GPP and ET are denoted as and (see Figure S1 of Supporting Information S1 for calculation of ), respectively. We considered as well in our cost function to better constrain the parameters of the simple hydrological model used in this study. The NNSE values (Nash & Sutcliffe, 1970) are between zero and one, where one is the best, and zero is the worst agreement between observed and simulated data. Here, we used these normalized values so that minimizing always results in better model performance in comparison to using , where can have values between (worst agreement) and one (best agreement). In Equation 12, is the total number of good quality data points from each timestep for a site–year or site . in Equation 13 is random uncertainty (standard deviation of fluxes in a sliding window of 5 days and 1 hr of the time-of-day of the current timestamp) of NEE or ET from each timestep for a site–year or site (Table A1). Similarly, in Equation 13 is random uncertainty of full NEE and ET time series for a site–year or site . The normalized random uncertainty in Equation 13 was used in the cost function to allocate higher and lower weight to EC–derived values with lower and higher uncertainties, respectively.
The and were introduced as regularizers in to avoid over-fitting of the sensitivity functions (Bao et al., 2023; Bao, Wutzler, et al., 2022). These cost function components ensure that values of partial sensitivity functions were not penalized and favored under ideal and non-ideal conditions, respectively. The ideal and non-ideal conditions were determined by certain constant thresholds for all sites. Equation 14 ensured that the partial sensitivity functions, (Equation 2), only left part of the (Equation 4), (Equation 7) and (Equation 5) approaches one, when certain ideal environmental conditions ( [0 to 600 ], [0 to 1], [−5 to 40 C], [0–4,500 Pa], [0 to 1]) occur (these ranges are denoted by subscript ), so that the in Equation 1 reaches its maximum potential. The factor in Equation 14 was included to match the ranges of all other components in the cost function for the , models () so that all the components had equal weight. Equation 15 penalized the cases when the values of (Equation 2), only left part of (Equation 4), and (Equation 7), were greater than a certain threshold ( = 0.2, = 0.9, = 0.2) under non-ideal conditions ( 0 C, 2,000 Pa, 0.01) for photosynthesis.
In the case of per–site–year parameterization using cost functions in Equations 9 and 10, we fitted the model so that the annual average of GPP can be captured well for each site–year. Whereas, in the case of per–site parameterization using cost functions in Equations 9 and 10, the model was parameterized for each site. We performed another experiment as a balance between these two experiments using the , which is similar to Desai (2010) to put an additional constraint on IAV, and parameterized , , , and models for each of the EC sites. The cost functions, for , models (Equation 16) and for , and models (Equation 17) now include an additional term () to constrain the annual cumulative sum of GPP flux from each site . , , and (Equation 21) are cumulative sums of , , and from start of each year to timestep for each site , respectively.
Simulating and Evaluating GPP Estimates
Forward Runs
In the case of the site–year parameterization, we performed a forward run for each site–year using the respective set of calibrated parameter values and forcing data for that year. Afterward, we concatenated from all the years for a given site to assess model performance. For per–site parameterization using , and per–site parameterization, we used site-specific values of calibrated parameters to perform site-level model evaluation. We also applied calibrated model parameters for a certain PFT to simulate GPP at all the sites which belong to a certain PFT. Similarly, for the global parameterization, a single set of calibrated parameter values was used to simulate GPP for each site.
Model Performances Metric
We performed forward runs at an hourly scale and averaged the hourly simulations to daily, weekly, monthly, and annual temporal frequencies to calculate model performance measures at different temporal aggregations. Model performance was only evaluated for temporal aggregations from daily to annual for the model. We applied a data screening procedure (Appendix C) before calculating model performance measures.
We evaluated how well a model can simulate the IAV of GPP based on how well a model simulated the annual average GPP for a site. In this study, we performed most of our analysis using NSE (Nash & Sutcliffe, 1970) and normalized NSE, viz. NNSE (which is ) as NSE indicates the degree to which scatter between observed and simulated data fits to the 1:1 line. We used Equation 22 to calculate NSE between a reference () and a predicted () variable, where is the total number of time-steps, and , are the values of reference and predicted variables, at timestep . In addition, we calculated the square of the Pearson correlation coefficient () (Kirch, 2008) which explains whether the dispersion of observed and simulated data matches and in the case of an unbiased model, values of NSE will be closer to values of . Whereas, if a model is systematically biased, it will result in higher values, but bad NSE values (Krause et al., 2005). We also calculated Root Mean Squared Error (Chai & Draxler, 2014) to quantify how closely the mean of simulated data matches with the mean of the observed data.
Moreover, using Equations 23–25, we decomposed NSE values to linear correlation (), relative variability (), and bias () in some cases to investigate which of these were improved or diminished between different model parameterization strategies (Gupta et al., 2009). In Equations 24 and 25, and are standard deviations of and , respectively, and are mean and , respectively. We calculated these metrics using the Python (Python Core Team, 2021) package Permetrics v1.5.0 (Van Thieu, 2023; Van Thieu & Mirjalili, 2023), and the definition of each of the model performance metrics can be found in the package documentation.
Estimating Uncertainties in Annual GPP
GPP is not directly measured using the EC technique (Foken et al., 2011; Montgomery, 1948; Swinbank, 1951). Rather, GPP is derived from NEE using a flux partitioning method (Lasslop et al., 2010; Reichstein et al., 2005). The measurement of NEE is affected by low turbulence conditions when the EC system misses the carbon flux due to advection transport (Aubinet et al., 2010). The low turbulence conditions were identified by friction velocity threshold (), and NEE measurements during these conditions were discarded and gap-filled (Papale et al., 2006). 200 values of thresholds were estimated using a bootstrap approach for each site–year, and finally 40 different values (chosen from percentile 1.25 to percentile 98.75 with a step of 2.5) of was used to produce 40 different realizations of NEE data. These thresholds were either estimated per year using variable threshold (VUT) method where the threshold of a year depends on data from a given year and its neighboring years or estimated using data from all years, and remains constant across years, which is known as constant threshold (CUT) method (Pastorello et al., 2020). Thereafter, seven different representative NEE data were produced as percentiles (XX) 5, 16, 25, 50, 75, 84, and 95 of the 40 different NEE values for each VUT and CUT method. These seven NEE data were further partitioned to estimate GPP using daytime (DT) and night-time (NT) partitioning methods (Lasslop et al., 2010; Reichstein et al., 2005).
Here, we used GPP_NT_VUT_USTAR50 from the FLUXNET2015 data set (Pastorello et al., 2020) as our reference GPP or for model calibration and evaluation (see Section 2.2 and Appendix A). These GPP values were derived using night-time partitioning from NEE data (NEE_VUT_USTAR50) where the VUT method was considered, and the 50 percentile value of thresholds was applied. However, the GPP data disseminated in the FLUXNET2015 data set has a systematic uncertainty on the choice of NEE data (depending on percentile XX), and partitioning methods. This uncertainty can be large when the fluxes are aggregated to annual scale. We annually aggregated four other GPP variables (GPP_NT_VUT_05, GPP_NT_VUT_95, GPP_DT_VUT_05, GPP_DT_VUT_95) derived using either daytime or night-time partitioning method, and derived from 5 (NEE_VUT_05) and 95 (NEE_VUT_95) percentile NEE values, and screened for good site–years with their respective quality control flags (see Appendix C). We quantified the uncertainty range of annual for a site–year as the minimum of annual GPP_NT_VUT_05 and GPP_DT_VUT_05 to the maximum of annual GPP_DT_VUT_95 and GPP_NT_VUT_95. Thereafter, we also quantified the fraction of site years in a site for which annual estimated from different experiments were within the uncertainty range of . The values of were also affected by random uncertainties in NEE flux. We used hourly random uncertainties in NEE flux to weigh the contribution of a GPP value from a given hour in the cost function (Section 2.3). However, the random uncertainty at annual scale was not considered as they greatly reduce with aggregation (Hollinger & Richardson, 2005; Tramontana et al., 2016).
Factors Influencing Model Performance
We selected potential factors that can affect model performance at different temporal resolutions. These factors can be of two types. There were factors which we determined based on our experiment design, which included model types ( model, model, model, and model), parameterization strategies (per site–year, per site using , per site, per PFT, and global parameterization), number of years with good quality data (Appendix C) in a site. Whereas, other factors represent site-specific characteristics, including PFT, KG climate class, and climate–vegetation types.
First, we conducted Levene's test (Levene, 1960) to determine if the assumption of homoscedasticity was fulfilled across groups in the controlling factors. Then, we performed an N-way Analysis of Variance (ANOVA) (Kaufmann & Schering, 2014) with the potential controlling factors to determine which of them played a major role in determining model performance at hourly and annual temporal scales. For analysis at an hourly scale, the model was not included as this model produced simulations at a daily scale. We performed two N-way ANOVA analyses once including the performance of the model, and then excluding the performance of the model. The Levene's test and N-way ANOVA analyses were implemented using SciPy v1.11.3 (Virtanen et al., 2020) and statsmodels v0.14.0 (Seabold & Perktold, 2010), respectively.
Evaluating GPP Estimates in Water–Limited Ecosystems
We investigated to determine whether explicit accounting of the drought stress function in the model had improved its performance at arid sites. For this purpose, we chose the aridity index (AI) to determine which sites were arid or semi-arid, as this index provided a numerical representation of moisture availability (Zomer et al., 2022) at a location. The AI values were calculated by dividing the average precipitation () per hour by the average potential evapotranspiration () per hour for the whole observation period at a site.
We drew examples from a few site-specific results to highlight different aspects of the behaviour of and models for ecosystems with contrasting soil moisture controls on GPP and with a larger availability of good–quality measurements. For this purpose, we chose a water–limited semi-arid site (annual average precipitation of 318 mm) in central Australia (Alice Springs, AU-ASM). This site also features a complex mixture of Mulga woodland and savanna (Cleverly et al., 2013; Pastorello et al., 2020). In contrast, we also highlighted the behaviours of and models in an irrigated cropland (Mead - irrigated continuous maize site, US-Ne1) in the mid-western U.S.A (Amos et al., 2005; Pastorello et al., 2020).
Effect of Temporal Resolution of Data on Model Performance
We parameterized the LUE model of Bao, Wutzler, et al. (2022) with hourly and daily data for model and model, respectively. We performed a comparison between these two versions of the model to highlight whether the resolution of data used for model parameterization can substantially affect the prediction of the annual average or IAV of GPP fluxes. Here, we also drew a site-specific example from an energy–limited deciduous forest in central Germany (Hainich, DE-Hai) as this site had a very long observation period (Knohl et al., 2003).
Evaluating Modeling Experiments of Various Complexities
We formulated our experiments using models and parameterization strategies consisting of varying numbers of model parameters to be calibrated. The number of parameters calibrated for a detailed parameterization strategy, such as per site–year parameterization was substantially higher than a generic parameterization strategy, such as global parameterization. We used Akaike's Information Criterion (AIC) to investigate whether a complex modeling experiment with a higher number of parameters can better simulate GPP (Burnham & Anderson, 2004).
Following recommendations of Burnham and Anderson (2004), we used Equation 26 to calculate AIC when , where is the total number of observations and is the total number of parameters. Otherwise, we used a corrected version of AIC (, Equation 27). Though the values of AIC or can be in any range, the lowest value of AIC or determines the preferred modeling experiments. and are th observations of EC-derived GPP and simulated GPP, respectively in Equations 26 and 27. We considered from all the four variations of models, that is, model, model, model, and model for calculation of AIC or . We calculated AIC at hourly and daily aggregations by concatenating good quality (Appendix C) hourly or daily data, and daily averages and from all the days from all sites. Similarly, we used monthly and annual aggregations for calculating at monthly and annual scales, respectively. was calculated at monthly and annual aggregation, as was usually smaller than in these cases. The value of was the total number of model parameters calibrated for all the site–years, for all the sites, for all the PFT, and for a specific model in case of per site–year parameterization, per site parameterization using , per site parameterization, per PFT parameterization, and global parameterization, respectively.
Difference in Model Performances Between PFTs
We studied the distribution of model performances of the model and the model for different parameterization strategies across PFTs and climate–vegetation types (Section 2.2). We performed non-parametric statistical significance testing using a two-sample Kolmogorov-Smirnov (K-S) test (Hodges, 1958) between a given pair of parameterization strategies for a given model and PFT to test if model performance obtained from a parameterization strategy is significantly different from another for a PFT and a model. Similarly, we performed the K-S test between model performances of the model and the model for a given parameterization strategy and a PFT to test if model performances were significantly different between two models. We performed the K-S test using SciPy v1.14.1 (Virtanen et al., 2020).
Simulating GPP Peaks
We assessed model performance in predicting peak . We defined peak and peak as the 90th percentiles of hourly () and (), respectively, following the concept of good hours by Zscheischler et al. (2016) and Fatichi and Ivanov (2014). We calculated and for each site–year considering only good quality hourly data (Appendix B). We compared the ratios of peak from model and model to for each parameterization strategy in order to identify possible biases.
We furthermore investigated whether improving the simulation of peaks of improved the simulation of IAV of GPP. We calculated NNSE between and () from all the site–years in a site considering only good site–years and only for sites with more than 3 years of good quality data (Appendix C) for a parameterization strategy . Similarly, we calculated NNSE between the annual average of and () for sites with more than 3 years of good quality data (Appendix C) for a parameterization strategy . Then, differences between and were calculated for a pair of parameterization strategies where and are two different parameterization experiments, for both model and model (Equations 28 and 29). Correlation between and were then investigated to study whether a certain parameterization strategy for a given model better captured the peaks, and thus contributed to higher annual model performance.
Results
Overall Model Performance
All four models, that is, , , , and models performed significantly better at the hourly scale than the annual scale (Figure 2). The use of an additional constraint on IAV, that is, did not contribute to better model performance across sites at an annual scale and performed closer to parameterization per site and poorer than site–year parameterization (Figure 2). The median model performance was highest for the model parameterization per site–year among all model parameterization strategies (per site–year, per site using , per site, per PFT, and global parameterization) for all four models (Table D1). Model parameterization per site–year also produced the best model performance at all temporal aggregation levels including annual aggregation (Figures 2, S5, and Table S2 of Supporting Information S1). model performed substantially better for the majority of the sites compared to model at all temporal aggregation levels as it explicitly considered site-specific water availability (Figures 2, S5, and Table S2 of Supporting Information S1). Comparison of model performances at different temporal aggregations also revealed that and models performed slightly better than the model across all timescales (hourly, daily, weekly, monthly, and annual), as the and models were more flexible than the model and captured ecosystem response with a broad range of parameters (Figure 2, Table D1, Figure S5 and Table S2 of Supporting Information S1). For example, the median NNSE(s) at the hourly resolution were 0.827 and 0.855 for the model and the model, respectively. Conversely, at the annual resolution, the median NNSE(s) were 0.543 and 0.628 for the model and model, respectively.
[IMAGE OMITTED. SEE PDF]
The annual average of was within the uncertainty range of (as defined in Section 2.4.3) for most site–years in most of the sites when (mean fraction of site–years per site: 0.87), (mean fraction of site–years per site: 0.93), and (mean fraction of site–years per site: 0.94) models were parameterized per site–year (Figure 3 and Table E1). model overestimated annual beyond the uncertainty range of in most cases for all parameterization methods, where the mean fraction of site–years per site within the uncertainty range of was 0.13 (Figure 3 and Table E1). Mean fractions of the site–years per site for which was within the uncertainty range were also similar for site-specific parameterization compared to site–year specific parameterization for , , and models (Table E1). However, for PFT-specific parameterization and global model parameterization, the annual average of was unreliable and went beyond the uncertainty range of for most cases (Figure 3).
[IMAGE OMITTED. SEE PDF]
Factors Influencing Variability in Model Performance
We summarized the percentage contributions of factors which influenced model performance at hourly and annual scales and found that most of the variability in model performance came from how we designed our modeling experiments (Figure 4). Model types was a crucial factor when the model was included in N-way ANOVA analysis, as this model had comparatively poor performance at both hourly and annual scales and resulted in greater variability in the NNSE values (70.1% and 63.5% contribution to the sum of squares of the regression, viz. SSR in hourly and annual scale, respectively). We then excluded the model from further analysis to uncover the other factors behind the model performance and found that for the hourly scale, the model performance varied the most across the groups of KG classes (33.9% contribution to the SSR), followed by parameterization type (31.0% to the SSR) and climate–vegetation type (25.4% contribution to the SSR) (Figure 4). However, at an annual scale, the parameterization strategy strongly affected (62.3% contribution to the SSR) the model performance, as per–site–year parameterization usually better simulated the annual GP compared to other parameterization strategies. The number of good years (Appendix C) used for calculating annual NNSE also exerted a small influence (4.9% contribution to the SSR) on the annual model performance. In general, there were only slight performance differences between models when the model was not considered, and model parameterization played a bigger role in the variability of model performance.
[IMAGE OMITTED. SEE PDF]
Effect of Drought Stress on Model Performance
The performance of the model to predict the annual average of from each of the site–years substantially improved in comparison to model (from an NSE of −1.39 to 0.92) after explicit consideration of soil water supply in the model (Figure 5). Most of this improvement came from the better prediction of the annual average of at the arid and semi-arid sites (with AI values lower than 0.5). For the semi-arid site (AU-ASM) the predicting performance of the model for all the parameterization strategies largely benefited from the explicit inclusion of soil water supply constraints (Figure 6). The systematic bias in model simulations was also improved after the inclusion of a drought stress constraint, as well as the modeling bias also improved from a generalized to a detailed parameterization strategy. Although the coupling of a simple hydrological model which calculated water–availability based on precipitation and evapotranspiration and inclusion of drought stress function generally improved the model for most of the site–years at an arid site, the model failed to capture the (Text S3, Figure S6 of Supporting Information S1) at an irrigated cropland site (US-Ne1), as the simple hydrological model which we used to calculate water–availability lacked representation of human management.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
Effect of Temporal Resolution of the Data on Model Performance
The use of hourly data to constrain model parameters and aggregating hourly values of to annual scale did not have a significant effect on the model performance in comparison to parameterization of the same model with daily data, that is, model, in simulating the annual average of (Figure 7) for each site–year. The value of NSE decreased from 0.961 to 0.891 for the model compared to the model, and both models performed mostly similarly. Here, for the model, we also focus on a site-specific example at a site (DE-Hai) in central Germany with a deciduous broadleaf forest where the model proved to be capable of simulating annual average of flux relatively well when the model was parameterized for each site–year and each site (Figure 8). However, was underestimated in cases of PFT-specific and global parameterization. For this specific site, the model performed relatively better in comparison to model (Figures 8 and S7 of Supporting Information S1).
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
Role of Parameterization Strategies on Model Performance
Model performances at an annual scale generally increased with a more detailed parameterization strategy (Figure 9). For the model, the median differences in annual model performance between the most detailed parameterization strategy, that is, site–year parameterization, and other detailed parameterization strategies, that is, per site parameterization using and per site parameterization were small, which were 0.12, and 0.11, respectively. In contrast, the median differences in annual model performance between the most detailed parameterization strategy, that is, site–year parameterization, and other generalized parameterization strategies, that is, PFT-specific parameterization, and global parameterization were quite large, which were 0.28, and 0.37, respectively. Similarly, for the model, the median differences in annual model performance between the most detailed parameterization strategy, that is, site–year parameterization, and other detailed parameterization strategies, that is, per site parameterization using and per site parameterization were 0.16, and 0.15, respectively. In contrast, the median differences in annual model performance between the most detailed parameterization strategy, that is, site–year parameterization, and other generalized parameterization strategies, that is, PFT-specific parameterization, and global parameterization were 0.34, and 0.39, respectively. The positive values of median annual model performance confirm the highest median performance of site–year parameterization compared to the other four parameterization strategies.
[IMAGE OMITTED. SEE PDF]
At an hourly scale, differences in model performance between a pair of similar parameterization strategies, such as a pair of detailed parameterization (i.e., between site–year-specific and site-specific) or a pair of generalized parameterization (i.e., between per PFT and global) approaches for both models were small (Figure S8 of Supporting Information S1). However, this difference can be higher between a detailed and a generalized model parameterization strategy. The median differences in hourly NNSE between site–year-specific and site-specific model parameterization were 0.02 and 0.01 for the model and model, respectively. In contrast, the median differences in hourly NNSE between site–year-specific and global model parameterization were 0.11 for both and models.
The median differences in annual model performance between per–site parameterization using and per–site parameterization were relatively small, which were 0.01 and 0.00 for model and model, respectively, and it shows the additional constraint on IAV of GPP flux in the cost function did not substantially improve annual model performance. At hourly scale, median differences in model performance between per–site parameterization using and per–site parameterization were also non-existent, which were 0.00 for both and models. Though the per–site parameterization using did not improve the annual model performance, it also did not degrade the hourly model performance.
Differences Between Modeling Experiments of Various Complexities
The lowest AIC values were obtained for per site–year parameterization for all the models at hourly and daily scales or aggregations, suggesting the sum of squares errors () was substantially reduced even when a comparatively complex parameterization strategy with a large number of model parameters was chosen (Table 3). The AIC values gradually increased from per site–year, per site, per PFT to global parameterization at hourly and daily scales or aggregations for all three models which are model, model, and model (Table 3). Semi-empirical models, that is, model and model also had mostly lower values of AIC compared to optimality-based model even though more parameters were parameterized for these models (Table 3). At the daily scale, the model had the lowest AIC for all the parameterization experiments due to the parameterization at daily scale. Whereas, for the other two models, parameterization and forward runs were performed at an hourly scale and then simulations were aggregated to daily resolution. The model was not included in AIC or analysis as previous results proved this model significantly underperformed compared to the other models, and this will always result in higher AIC or values.
Table 3 Akaike's Information Criterion (AIC) or Corrected AIC () Values for Modeling Experiments of Various Complexities
| Temporal scale/ aggregation | Models | Parameterization strategies | ||||
| Per site–year | Per site using | Per site | Per PFT | Global | ||
| Hourly scale (AIC) | ||||||
| Daily scale/ aggregation (AIC) | ||||||
| Monthly aggregation () | ||||||
| Annual aggregation () | ||||||
At monthly and annual scales, we show the differences in values between , , and models for the same parameterization strategy, and not between parameterization strategies in a same model. The reason behind this is values largely depend on the relationship between sample size, that is, , and the total number of parameters which were parameterized, that is, . The values of became very large even when a significantly smaller was obtained, and they became unreliable when the value of was closer to . For example, at monthly aggregation, per site–year parameterization of the model had a very high value of even when it had the lowest among all the five parameterization strategies (Tables S3, S4, and S5 of Supporting Information S1). The model proved to be better able to capture the seasonal cycle, that is, monthly GPP estimates compared to the other two models for most of the parameterization experiments considering the number of parameters parameterized (Table 3). However, the model had the lowest value in the case of per site parameterization using and per site parameterization at monthly aggregation. In contrast, at an annual scale, the model had mostly the lowest values, and some of the experiments also suffered from the above-described unreliable estimates, where and had similar values (Tables 3, S3, S4, and S5 of Supporting Information S1).
Model Performances Across Different Plant–Functional Types
Generally better model performances were achieved with both the model and the model when parameterized detailed model parameterization strategies were used (Figure 10). In this analysis, we removed the per–site parameterization experiment using as it performed very similar to per–site parameterization, and also we did not consider the model as it produced poor performance across all the PFTs.
[IMAGE OMITTED. SEE PDF]
The highest median NNSEs were obtained with per–site–year parameterization for almost all the PFTs for both models. The only exception is WSA for which the median model performance (median NNSE: 0.85) was marginally higher for per–site parameterization experiment using compared to per–site–year parameterization (median NNSE: 0.84) of model. For the model parameterization experiments, the highest median value of NNSE was found for CSH for per–site–year parameterization (median NNSE: 0.88), DBF for per–site parameterization (median NNSE: 0.85), CSH and DBF for per-PFT parameterization (median NNSE: 0.81), and CSH for global parameterization (median NNSE: 0.80). However, CSH had only three sites and highest median model performance for CSH should be interpreted with caution. For the model parameterization experiments, the highest median value of NNSE was found for DBF for per–site–year parameterization (median NNSE: 0.89), DBF and MF for per–site parameterization (median NNSE: 0.87), DBF for per-PFT parameterization (median NNSE: 0.82), and DBF and MF for global parameterization (median NNSE: 0.81). We found similar results also for climate–vegetation types (Section 2.2), where a more detailed parameterization strategy achieved higher model performance than a generalized parameterization strategy (Text S5 and Figure S9 of Supporting Information S1).
Specifically, we found the model performance significantly differed between per–site–year parameterization and global model parameterization for most PFTs and both models (Figure 10). Model performance was also similar between a pair of detailed parameterization strategies (per–site–year and per–site parameterization) and general parameterization strategies (per-PFT and global parameterization) for most PFTs. For PFTs with very few sites, such as CSH, DNF, SNO, and WSA, statistical significance testing showed that the model performance was similar among parameterization strategies. However, statistical significance tests can be unreliable for a few samples. The model performance between model and the model for a specific parameterization strategy was similar for most PFTs (Table S6 of Supporting Information S1). Similarly, model performance between both models for a specific parameterization strategy was also identical for most PFTs (Table S7 of Supporting Information S1).
Correlation Between Annual Model Performance and Model Performance in Simulating Diurnal GPP Peaks
One of the crucial reasons behind poor annual model performance (Figure 2 and Table D1) can be the inability of both the model and the model to capture the peaks of (Figure S10 of Supporting Information S1). Specifically, was highly underestimated in the case of global parameterization. The median of the ratio of to were 0.77 and 0.84 for the model and the model, respectively, during global parameterization. The underestimation generally decreased with more detailed parameterization strategies, with little difference between the models. The median values of the ratio of to were 0.95 and 0.93 for the site–year parameterization of the model and the model, respectively. Moreover, the lower values of the interquartile range (IQR) of these ratios signify the importance of site–year parameterization compared to per PFT or global parameterization to reliably capture the peak in diurnal cycles for most of the sites and to attain better model performance at the sub-daily scale. The values of IQR were 0.1 for both the model and the model in the case of site–year parameterization, 0.44 and 0.37 for the model and the model, respectively in the case of PFT-specific parameterization, and 0.44 and 0.49 for the model and the model, respectively in the case of global parameterization.
We further found that if a certain parameterization strategy better simulated the for each site–year, it corresponded to a comparatively better annual model performance for a site which is demonstrated by the positive values of Pearson correlation coefficients (Figure 11). Here also, a detailed parameterization strategy, such as site–year parameterization resulted in a better simulation of , and thus better annual model performance for most of the sites compared to a generalized parameterization strategy, such as global parameterization. In this case when was site–year parameterization and was global parameterization, 91% and 89% sites had higher than and corresponding than for the model and the model, respectively. When was parameterization per site using and was parameterization per site, respectively, only 34% and 37% had positive values of (i.e., ) and corresponding (i.e., ) for the model and the model, respectively. This signified that using an additional constraint related to the IAV of GPP in the cost function during model parameterization did not improve the prediction of peak GPP values for most of the sites.
[IMAGE OMITTED. SEE PDF]
Discussion
Uncertainties in Modeling Experiments
Any model–data–integration study is prone to uncertainties related to both data and the model. The EC data set used in our study has a couple of well-known uncertainties. For example, the NEE measurements using the eddy covariance technique can have uncertainties due to the accumulation of atmospheric CO2 under the canopy at night (storage) and a sudden turbulent mixing during the morning when the stable night-time boundary layer breaks up, or because of advection of atmospheric CO2 out of the control volume sampled by the eddy covariance system (Aubinet, 2008; D. Baldocchi et al., 2000; Jocher et al., 2018). The GPP fluxes that we used were derived from NEE measurements by extrapolating the night-time respiration of the ecosystem (Reichstein et al., 2005) to daytime. Moreover, GPP can be estimated based on another well-known algorithm, the daytime partitioning method (Lasslop et al., 2010). We preferred night-time partitioning as only respiration is modeled in this method. In daytime partitioning, both GPP and respiration are modeled, resulting in higher prediction errors. The uncertainties in our modeling results due to the choice of partitioning algorithm should be small as quantified in a previous study by Desai, et al. (2008). Papale et al. (2006) proposed a standardized set of flux correction techniques for the above-described issues related to EC methods and removing random spikes in EC measurements. Many of these correction techniques were adapted to the standardized ONEFlux pipeline for the production of the FLUXNET2015 data set (Pastorello et al., 2020). Papale et al. (2006) also found that the uncertainties associated with annual NEE, and fluxes derived from NEE, such as GPP and TER have an inherent uncertainty well below 100 . In our study, we found that annual produced by experiments related to detailed parameterization strategies were mostly within the range of uncertainty of on the choice of different variables produced by the data processing pipeline. However, quantifying full uncertainty in related to measurement errors, random noises, data gaps and gap-filling methods, flux correction methods, and flux partitioning methods as well as propagating all the uncertainties to various temporal scales is a very challenging task and beyond the scope of this study. Similarly, can also have various uncertainties related to model structure and simplified representation of ecosystem functions (such as stomatal conductance, photosynthesis, leaf energy balance etc.), forcing data, and parameters (Schaefer et al., 2012; Zheng et al., 2018). For example, the Bao model used in our study is based on the big leaf assumption. In contrast, differentiating between sunlit and shaded leaves (two big-leaf assumption) in a similar model can lead to higher performance under certain conditions, such as under hot and dry conditions at daily and weekly scales (Bao, Ibrom, et al., 2022). However, the two big-leaf model was also equally poor at explaining IAV. In this study, we aimed to address some of these uncertainties in by adopting majorly two different model structures and various parameterization schemes.
We also used ET in the cost function which is equivalent to latent heat flux. The mismatch between the summation of latent, sensible, and ground heat fluxes with net radiation calculated using incoming and outgoing radiation, the so-called lack of energy–balance closure, remains a long-standing challenge with EC measurements (Foken, 2008; Mauder et al., 2020; Zhang et al., 2024). Quality control of millions of data points at an hourly scale was also challenging, especially when we merged data from various sources, such as in-situ measurements, modeled re-analysis data, and remote sensing-based estimates. Another major uncertainty arises from the mismatch between the footprint of EC towers and the grid of remote sensing data which were used to calculate vegetation indices (Chu et al., 2021). The PFT classification of sites based on a simple PFT classification method may not accurately represent the vegetation of some of the sites. For example, a site at Alice Springs (AU-ASM) in central Australia was classified as a savanna in FLUXNET2015 (Pastorello et al., 2020). In fact, this site is dominated by a discontinuous canopy of Mulga (Acacia aneura) that has needle leaves and a seasonal understory grassy layer (Cleverly et al., 2013). This site can be classified as a woody savanna as well. An arctic site in Bayelva (SJ-Blv) has a combination of snow, wet grounds, and specific tundra vegetation (Boike et al., 2018) which were not well represented by the snow classification of FLUXNET2015 (Pastorello et al., 2020). Another limitation of the data set is that sites are mostly clustered in European and North American countries and, hence do not necessarily represent global ecosystem functioning particularly well due to sampling bias (Papale et al., 2015). Similarly, some PFTs are represented by very few sites, which makes PFT-specific parameterization challenging.
General Performance of Models in Simulating GPP
Mengoli et al. (2022) evaluated the model across 10 sites consisting of boreal forest, temperate deciduous broadleaf forest, mixed forest, tropical forest, and temperate grassland. They found reasonable performance of model for these sites. In this study, we extended the evaluation of optimality-based and models across a wide range of sites, representing various vegetation and climate types. We uncovered poor model performance of the model at many sites, especially at arid sites. Calculating using a simple hydrological model and inducing a moisture stress function, that is, the introduction of the model substantially improved model performance to simulate the annual average of GPP fluxes across many sites, including both water–limited and energy–limited sites. Inclusion of the moisture stress function in the model improved not only the annual model performance but also the model performance across all the temporal scales or aggregation levels. This highlights the importance of representing soil moisture conditions in modeling approaches, which aim to accurately represent ecosystem functioning and vegetation response. Our findings also support another study by Schaefer et al. (2012) which found mostly poor performance of 26 models in simulating GPP under dry conditions across EC sites in North America. Even though the drought stress function used in this study does not explicitly account for long legacy effects, our results still confirmed the substantial effect of drought stress on GPP as found by numerous other studies (Anderegg et al., 2015; Müller & Bahn, 2022; Z. Yu et al., 2017; X. Yu et al., 2022). However, the coupling of the hydrological model raised the need to calibrate nine more parameters, which counters the vision of developing a parameter–sparse approach using theories that demand a lower site or site–year specific fine–tuning of model parameters (Prentice et al., 2015). Further experimentation is needed to find a balance between the number of key model parameters, which require calibration, and an accurate representation of ecosystem processes.
Coming to the differences in model structure, we found that semi-empirical models ( and models) performed statistically better, that is, had a lower value of AIC compared to optimality-based model ( model) at hourly and daily scale or aggregations for most of the parameterization experiments even though the semi-empirical modeling experiments needed more parameters to be parameterized. At the monthly aggregation level, the seasonal cycles were also significantly better captured by the parameter–heavy semi-empirical model parameterized with daily data ( model) for most of the parameterization experiments. However, at the annual aggregation level, the optimality-based model, that is, the model was comparatively better in most cases and a more flexible semi-empirical model with a higher number of parameters did not have a substantial improvement in annual model performance. In an earlier study, Sims et al. (2005) found that midday gross CO2 flux is highly correlated with daily and 8-day average of gross CO2 flux and the midday values can be used to estimate fluxes at higher timescales, that is, at daily and 8-day. In this study, we also found that both hourly and daily GPP estimates from the model, model, respectively can be used to estimate fluxes at weekly and monthly scales reliably. However, both models poorly estimated the annual average GPP flux per site.
Though the partial sensitivity functions of environmental variables used in the model and the model were found to be applicable for most of the sites, they can be of many different types and may vary across site conditions (Bao, Wutzler, et al., 2022). The EC sites were also affected by human management, such as irrigation, harvesting, and mowing as well as natural disturbances, such as fire, and pest attacks. These factors can affect the IAV of GPP flux which was estimated from EC measurements. Models used in this study may not be able to account for all of these factors due to structural limitations. For example, in the hydrological model, we only used precipitation and ET to calculate the mass balance of water. However, human management (such as irrigation and drainage) can play an important role, and the estimates in managed sites, such as at an irrigated maize site (US-Ne1) may not be accurate.
The Importance of the Parameterization Approach on Estimating IAV of GPP
We also performed inter–comparison between five different parameterization approaches for both an optimality-based model and a semi-empirical model, and also for a large number of sites. We did these extensive evaluations to emphasize the importance of model parameterization in capturing the IAV of GPP and gain higher confidence in our conclusions. Model parameterization largely determined model performances and calibrated parameters captured the individual characteristics of sites or climatic events of the site–years (J. Wu et al., 2012). Earlier studies (Groenendijk et al., 2011; Huang et al., 2021) found higher variability of model parameters across sites within a PFT and detailed parameterization was necessary to better simulate carbon fluxes as there can be variations in vegetation functioning and climatic conditions between sites and a simple PFT classification may not be enough to account for these variations. Similarly, we also found detailed model parameterization strategies, such as parameterization specific to site–years or sites comparatively better predicted the annual average of GPP fluxes and year-specific parameters explained some parts of the IAV of GPP flux. At hourly and daily scales, we found slightly better model performance in the case of site–year parameterization compared to site-specific parameterization. This can mainly be due to variations in hydrological conditions, such as drought or excess rainfall between years in a site and site–year–specific parameters may be able to better capture these events. However, in the annual scale model performance was similar between site–year and site-specific parameterization, even though we assumed parametrizing models using data from each site–year would be advantageous to capture IAV of GPP. Changes in parameters between years could reflect structural model issues, such as changes in state variables not represented correctly in the model (e.g., soil moisture) or absent in the model, but with a possible role on IAV (e.g., reserve or enzymatic pools, other biogeochemical cycles, such as nitrogen, or biotic factors promoting, e.g. herbivory). Changes in parameters could also reflect observation artifacts, such as biases, or changes in instrumentation. We would argue that the difference from site to site–year would reflect possible enhancements in performance via improvements in model structure, or via statistically learning temporal dynamics of parameters, which would add an inference component from the influence of environmental forcing on the responses from missing processes. Moreover, as the fast rate of change in climatic characteristics has become more frequent in recent years, developing a generalized model structure to simulate carbon fluxes between years and/or between sites of similar vegetation types has become even more challenging (Knauer et al., 2023).
The generalized model parameterization strategy, that is, global parameterization was also dominated by PFTs, such as ENF and GRA which were represented by many sites, and certain PFTs, such as DNF was represented by only one site in the FLUXNET2015 data set (Pastorello et al., 2020). This may imply that global parameterization or parameter up-scaling experiments using the FLUXNET2015 data set (Pastorello et al., 2020) may result in biased parameter sets that cannot be generalized to the global scale or a weighted site representation may be necessary in this case. Besides model parameterization, a recent study by Zou et al. (2024) highlighted that the importance of each independent driver, and their relative contributions vary over time and ecosystem type. The relative importance of forcing variables may also be another factor besides model parameterization. We also found both the model and the model showed the highest model performance at the sub-daily scale mostly for forest sites compared to savannas or grasslands, this in turn led to the poor simulation of IAV at many sites which are not forests. In most cases, we found that parameterization strategy played an important role in the variability of model performance across PFTs. For most PFTs, model performance between per–site–year and global parameterization were statistically different. However, there was no significant difference between model performance across PFTs between model and model.
Though we have demonstrated the capability of the model that included drought stress and the model to simulate the hourly fluxes of GPP, accurate estimation of IAV of GPP fluxes at the site level with these models requires further developments. Particularly, both models failed to capture the peak GPP in diurnal cycles at many sites even after model parameterization at a sub-daily scale and using an additional constraint on the IAV of GPP in the cost function. These underestimations at an hourly scale may have accumulated to a larger error when the fluxes were aggregated at an annual scale to study the IAV. We also showed that comparatively better model performances were achieved when the GPP peaks per site–year were better simulated. These results are similar to another study by Lin et al. (2023), directed at evaluating terrestrial ecosystem models' capability in explaining the IAV of GPP which also found an underestimation of GPP. Though the occurrences of peak GPP in a diurnal cycle can be comparatively smaller, an earlier study by Zscheischler et al. (2016) found that high GPP values during ideal weather conditions majorly contributed to the IAV of GPP flux. While the study of Zscheischler et al. (2016) was performed for temperate forests, we found a similar pattern for sites representing diverse climate–vegetation types in our study. Paschalis et al. (2015) also found that short-term variability of meteorological forcing can affect carbon and water fluxes from hourly to annual or even at longer scales, especially if they affect soil water availability and similar slow processes. Another study by Fatichi and Ivanov (2014) which involved synthetic data analysis also concluded that random occurrences of favorable weather conditions and subsequent higher carbon assimilation explain IAV of aboveground NPP better than average weather conditions of a year or growing season. Unlike some of the previous studies mentioned above which were mainly focused on finding correlations between GPP peaks and annual GPP from either only observed data or synthetic data, here, we found a direct correlation between model performance in simulating GPP peaks and model performance in simulating IAV of GPP. It is also true that some of the peak values found in EC-derived diurnal GPP can also be outliers, which are artifacts of data processing algorithms such as the gap-filling algorithm. It can be hard to distinguish these outliers from the true peaks of GPP. However, the bias introduced by the gap-filling algorithm is relatively low (Moffat et al., 2007). Therefore, it is unlikely that the reasoning behind the general underestimation of GPP peaks by models is due to the bias introduced during gap-filling.
The poor representation of IAV of GPP can be attributed to either limitations of models or model parameterization strategies. It is important to discover which seasonal phases of the GPP dynamics for particular vegetation types or climatic zones are not well represented in models simulating the IAV of GPP. In our study, we accounted for biophysical forcings, such as meteorological variables in our models, and used fAPAR to represent phenological dynamics. However, if there is a decoupling between the phenological and GPP dynamics, the length since GPP onset and senescence can be another factor behind IAV, especially during climate extremes, or large meteorological time shifts. Previous studies such as Wolf, et al. (2016) showed that the net effect of climate extremes, that is, the 2012 summer drought on carbon uptake was compensated by warmer springs, and increased soil moisture consumption during warmer springs led to the drought conditions in summer. Similar findings were also reported by van der Woude et al. (2023) for European forests where the effect of the 2018 drought was offset by prolonged carbon uptake during warm autumn. Bastos et al. (2020) and Smith et al. (2020) reported a similar finding that the increased vegetation growth during warm spring was responsible for soil moisture depletion and inflation of the 2018 drought effect in Europe. It is particularly important to focus on the meteorological sensitivity of GPP during periods of high productivity where improvements in the prediction of high fluxes would tend to improve the description of IAV. Another aspect could also be to decompose the metric (Gupta et al., 2009) used in the cost function or develop a more detailed model evaluation to understand which other parts of the time series were not well constrained during model parameterization.
Conclusions
We have demonstrated the capability of an improved version of an optimality-based model ( model) and a semi-empirical LUE model ( and models) to simulate sub-daily or daily GPP fluxes across 198 EC sites, representing 13 different vegetation types including forests, grasslands, savannas, croplands, and tundra. We also performed various model parameterization strategies to systematically evaluate the factors affecting the IAV of GPP. The main conclusions from our study are:
-
We found that the semi-empirical model mostly produced better results at hourly, daily, and monthly scales compared to the optimality-based model. At an annual scale, the improvement in the performance of the semi-empirical model was not significant even though more parameters were parameterized to flexibly capture the ecosystem dynamics. Both optimality-based and semi-empirical models performed better on hourly, daily, weekly, and monthly scales compared to annual scale.
-
Model structure was an important factor behind variability in both hourly and annual model performances when drought stress was not explicitly accounted for in the optimality-based model. However, differences in climate classes and parameterization strategies became important factors behind variability in hourly and annual model performances, respectively, when the previous version of the optimality-based model (without drought stress) was not considered in the analysis.
-
Explicit accounting of drought stress in the optimality-based ecosystem model is a necessity as it proved to be an important factor in controlling GPP fluxes at all temporal scales including at annual aggregation. The explicit representation of drought stress not only improved the performance of the optimality-based model at arid sites but also at energy–limited sites.
-
Both models performed better mostly at forest sites compared to grasslands or savannas which may also lead to poor estimation of IAV of GPP at many sites globally.
-
While these models generally performed well in simulating hourly GPP dynamics, the small errors at the sub-daily scale, particularly related to the estimation of GPP peaks, accumulated to bigger errors at the annual scale and led to poor performance of models in explaining the IAV of GPP. We found that comparatively better annual model performance could be achieved when the peaks of GPP were better simulated.
Our results further suggest the need to focus on sub-daily GPP dynamics during the various seasonal phases, especially highly productive ones, toward an improved constraint on GPP sensitivities. Hence, better annual model performance with a detailed parameterization strategy, such as site–year parameterization, signifies that temporally varying model parameters are necessary to better capture the variations of annual average GPP and indicate that ecosystem functioning is not stable between years. These new understandings can guide us toward developing models and parameterization strategies for simulating the inter–annual variations in ecosystem GPP more successfully, and improve our understanding of the global carbon cycle response to changing climatic conditions.
Appendix A - Data Description
Table A1 Description of Forcing and Model Parameterization Data
| Abbreviation | Definition | Unit | Variable name in data set/remarks | Reference |
| a | GPP derived from EC based net ecosystem exchange (NEE) using night-time partitioning* method. Used for model calibration and evaluation. | GPP_NT_VUT_USTAR50 | Pastorello et al. (2020); Reichstein et al. (2005) | |
| GPP_NT_VUT_05 | GPP derived from percentile 5 of 40 different estimates of NEE using night-time partitioning method. Used to quantify uncertainty in . | GPP_NT_VUT_05 | Pastorello et al. (2020); Reichstein et al. (2005) | |
| GPP_NT_VUT_95 | GPP derived from percentile 95 of 40 different estimates of NEE using night-time partitioning method. Used to quantify uncertainty in . | GPP_NT_VUT_95 | Pastorello et al. (2020); Reichstein et al. (2005) | |
| GPP_DT_VUT_05 | GPP derived from percentile 5 of 40 different estimates of NEE using daytime partitioning method. Used to quantify uncertainty in . | GPP_DT_VUT_05 | Pastorello et al. (2020); Lasslop et al. (2010) | |
| GPP_DT_VUT_95 | GPP derived from percentile 95 of 40 different estimates of NEE using daytime partitioning method. Used to quantify uncertainty in . | GPP_DT_VUT_95 | Pastorello et al. (2020); Lasslop et al. (2010) | |
| Random uncertainty for NEE | NEE_VUT_USTAR50_RANDUNC | Pastorello et al. (2020) | ||
| Latent heat flux | LE_F_MDS | Pastorello et al. (2020) | ||
| Random uncertainty for latent heat flux | LE_RANDUNC | Pastorello et al. (2020) | ||
| b | Incoming shortwave radiation | SW_IN_F | Pastorello et al. (2020) | |
| b, c | Net radiation | NETRAD | Pastorello et al. (2020) | |
| Potential incoming shortwave radiation | SW_IN_POT | Pastorello et al. (2020) | ||
| a | Incoming photosynthetic photon flux density | gap-filled with | Pastorello et al. (2020); see Section 3.4.2 of Stocker et al. (2020) for the gap-filling equation | |
| b | Air temperature | TA_F_MDS | Pastorello et al. (2020) | |
| b | Vapor pressure deficit | Pa | VPD_F_MDS | Pastorello et al. (2020) |
| b, d | Precipitation | or | P | Pastorello et al. (2020) |
| Atmospheric CO2 concentration dry air mole fractions from quasi-continuous measurements at Mauna Loa | ppm | CO2_mlo_surface- insitu_1_ccgg_DailyData (interpolated linearly to hourly scale). The measurements from Mauna Loa were used for all sites as the CO2 concentration measurements at EC sites are often noisy and discontinuous. | Thoning et al. (2021) | |
| Site elevation | m a.s.l. | Collected from literature | Bao, Wutzler, et al. (2022) | |
| a | Evapotranspiration derived from flux | or | Calculated from with a dependency on | Henderson-Sellers (1984) |
| Random uncertainty for | or | Calculated from with a dependency on | Henderson-Sellers (1984) | |
| Potential evapotranspiration | or | Calculated from , and using the method of Priestley and Taylor | Priestley and Taylor (1972) | |
| Cloudiness index | – | Calculated as | Bao, Wutzler, et al. (2022); Fu and Rich (1999); Turner et al. (2006) | |
| Water availability indicator | mm | Described in Text S1 of Supporting Information S1 | Bao, Wutzler, et al. (2022); Tramontana et al. (2016); Trautmann et al. (2018) | |
| Soil water supply | Calculated as ( is defined in Table 1) | Bao, Wutzler, et al. (2022) | ||
| Normalized difference vegetation index | – | Daily from FluxnetEO v2 (MODIS) was linearly interpolated to hourly | Walther et al. (2022, 2023) | |
| Fraction of incident photosynthetic photon flux that is absorbed by vegetation | – | Linear relationship between and was assumed. | Bao, Wutzler, et al. (2022); Myneni et al. (1997) | |
| a | Data quality flags | – | 1.0 (good quality), 0.5 (medium quality), and 0.0 (bad quality) in the case of hourly data, which is the fraction of good quality measured or gap-filled data from two half-hours. In the case of daily, can have any values between 0.0 and 1.0, which is a fraction representing the percentage of good quality measured or gap-filled data in a day. The daily data with 0.8 was considered good. | Pastorello et al. (2020); Nelson et al. (2024) |
Appendix B - Data Screening for Model Parameterization
We used only good–quality data to calibrate model parameters. At hourly scale, we selected and as good quality data when the values of their respective QC flag were 1 (Table A1). At the daily scale, we considered a and data point as good when the value of the QC flag was greater than 0.8. We also removed any data gaps from observed and simulated data, , and (Table A1). There were certain negative values in our data, as it was calculated using night-time based partitioning method (Reichstein et al., 2005). In this case, if a negative value occurred, when the (Table A1) is zero that is, during night hours, we replaced those data points with 0 and used them in the cost function. If the negative occurred during day hours, we excluded them.
Appendix C - Data Screening for Evaluation of Model Performance
The good quality data at an hourly scale were selected using the same criteria described in Appendix B. The data screening at a daily scale was also similar to Appendix B, when the LUE model was parameterized using daily data. For all other cases, we assigned a flag (0 = not considered, 1 = considered) to identify which data points were considered during model parameterization. We aggregated this flag to daily, weekly, monthly, and annual scales by taking averages. Then this flag indicated the fraction of good quality data used to calculate a data point in a certain temporal resolution. We only used data points at certain temporal resolutions which were calculated using more than 50% (flag value 0.5) good quality data points from hourly/daily resolution. We calculated monthly, and annual model performance metrics for a certain site if at least three data points were present. We couldn't calculate annual metrics for 76 and 85 sites due to low numbers of good quality site–years when the annual data was aggregated from hourly, and daily data, respectively. The monthly metrics were not calculated for the three sites due to the same reason when they were aggregated from daily data.
Appendix D - Median Values of Model Performance
The median values of the model performance metric, that is, NNSE which are plotted in Figure 2 are summarized in Table D1.
Table D1 Median NNSE Obtained at Each Modeling Experiment at Hourly/Daily Scale and Annual Aggregations
| Temporal scale/ aggregation | Models | Parameterization strategies | ||||
| Per site–year | Per site using | Per site | Per PFT | Global | ||
| Hourly/ daily scale | 0.827 | 0.799 | 0.805 | 0.738 | 0.712 | |
| 0.478 | 0.470 | 0.469 | 0.461 | 0.490 | ||
| 0.855 | 0.837 | 0.838 | 0.758 | 0.730 | ||
| 0.837 | 0.796 | 0.796 | 0.686 | 0.642 | ||
| Annual aggregation | 0.543 | 0.405 | 0.373 | 0.201 | 0.143 | |
| 0.018 | 0.019 | 0.018 | 0.018 | 0.019 | ||
| 0.628 | 0.460 | 0.460 | 0.187 | 0.189 | ||
| 0.704 | 0.449 | 0.475 | 0.233 | 0.151 |
Appendix E - Mean Values of Fraction of Site–Years per Site for Which Simulated GPP Was Within Uncertainty Range
The mean values of fractions of site–years per site for which the annual average of was within the uncertainty range of are plotted in Figure 3 and are summarized in Table E1.
Table E1 Mean Values of Fractions of Site–Years per Site for Which Annual Average of Simulated Gross Primary Production () Was Within the Uncertainty Range of Eddy-Covariance Derived GPP ()
| Models | Parameterization strategies | ||||
| Per site–year | Per site using | Per site | Per PFT | Global | |
| 0.869 | 0.757 | 0.773 | 0.481 | 0.409 | |
| 0.123 | 0.128 | 0.131 | 0.129 | 0.131 | |
| 0.931 | 0.831 | 0.840 | 0.531 | 0.430 | |
| 0.942 | 0.835 | 0.838 | 0.507 | 0.389 |
Acknowledgments
We would like to express our gratitude to the FLUXNET community, including regional networks: AmeriFlux, AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, FluxnetCanada, GreenGrass, ICOS, KoFlux, LBA, NECC, OzFlux-TERN, TCOSSiberia, and USCCC for their contributions in developing the eddy covariance based FLUXNET2015 data set, which was invaluable to our study. We also thank all the site investigators for generously sharing their data with the respective networks. The FLUXNET eddy covariance data processing and harmonization was carried out by the ICOS Ecosystem Thematic Center, AmeriFlux Management Project, and Fluxdata project of FLUXNET, with the support of CDIAC, and the OzFlux, ChinaFlux and AsiaFlux offices. We acknowledge suggestions received from Jacob Nelson regarding the FLUXNET2015 data set, suggestions received from Sophia Walther regarding the FluxnetEO data set, assistance received from Ulrich Weber in some of the data processing steps, FLUXCOM team for preparing hourly aggregates of FLUXNET2015 data set, assistance received from Giulia Mengoli with P-model codes, assistance received from Nikolaus Hansen regarding the usage of pycma, and members of the Model–Data Integration group at Max Planck Institute for Biogeochemistry for sharing constructive feedback. We thank Olaf Kolle, J. Rose Cleverly, and Shirley Papuga for providing clarifications and further information on site characteristics. We thank Donatella Zona, J. Rose Cleverly, Pasi Kolari, Julia Boike, Iris Feigenwinter, Donatella Spano, Javier Houspanossian, and Wang Yi for providing feedback on an earlier version of the manuscript. RD acknowledges the funding from the International Max Planck Research School for Global Biogeochemical Cycles (IMPRS-gBGC) for carrying out his doctoral research. TT considers this a contribution to the Swedish National Space Agency projects (SNSA Dnr, 2021-00144; 2021-00111) and also acknowledges funding from FORMAS (Dnr 2021-00644; 2023–02436). LŠ acknowledges support from the Ministry of Education, Youth and Sports of the Czech Republic within the CzeCOS program (Grant LM2023048) and the AdAgriF project (CZ.02.01.01/00/22_008/0004635). LH acknowledges funding from the SNF funded projects ICOS-CH Phase 1, 2, and 3 (20FI21_148992, 20FI20_173691, and 20F120_198227). SS acknowledges Horizon Europe funding (Open-Earth-Monitor Cyberinfrastructure project, 101059548). M. Roland and BG acknowledge the support of the Research Foundation Flanders (FWO) for the support to ICOS Research Infrastructure. LM acknowledges the funding from the Forest Services, Autonomous Province of Bozen. GW acknowledges financial support from the Autonomous Province of Bozen via the CarboST project. Large language models (LLM), that is, ChatGPT (version GPT-4-1106-vision-preview API) and GitHub Copilot (v1.172.0) were used to generate certain code snippets to perform some of the analyses in this study, and to generate code documentation, followed by careful evaluations by authors. However, no texts in this research paper were written with the aid of LLM. We thank the three anonymous reviewers, and handling editor, Natasha MacBean for their constructive feedback that contributed to the improvement of this research article. Open Access funding enabled and organized by DEAL Konsortium.
Data Availability Statement
The codes that were used to perform all the necessary analyses and plot all the figures in this study are available at (De, 2025). The data from eddy covariance sites are available through FLUXNET at , and the Digital Object Identifier (DOI) of the data set for each site are listed in the Table S8 of Supporting Information S1 (Pastorello et al., 2020; FLUXNET.org, 2024a). The FluxnetEO MODIS version 2 data set is available at (Walther et al., 2022, 2023). The ERA5 data set is available at Hersbach et al. (2023). The ERA-Interim v2.0 data set has been currently archived by ECMWF; however, it can be accessed following these (Berrisford et al., 2011). Atmospheric CO2 measurements at Mauna Loa observatory are available at (Thoning et al., 2021).
Erratum
Since original publication of this article, the following has been removed from the first paragraph of Section 2.1.1: “The probable reasons behind using the ‘P’ in the P-model are (a) ‘P’ stands for photosynthesis, (b) classically, GPP used to be denoted by ‘P’ (Monteith, 1972), and (c) the initial of the lead author (Prentice et al., 2014) who formulated the theories behind the model starts with ‘P’ (B. D. Stocker, personal communication, 06 May 2024).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A long‐standing challenge in studying the global carbon cycle has been understanding the factors controlling inter–annual variation (IAV) of carbon fluxes, and improving their representations in existing biogeochemical models. Here, we compared an optimality‐based model and a semi‐empirical light use efficiency model to understand how current models can be improved to simulate IAV of gross primary production (GPP). Both models simulated hourly GPP and were parameterized for (a) each site–year, (b) each site with an additional constraint on IAV (), (c) each site, (d) each plant–functional type, and (e) globally. This was followed by forward runs using calibrated parameters, and model evaluations using Nash–Sutcliffe efficiency (NSE) as a model‐fitness measure at different temporal scales across 198 eddy‐covariance sites representing diverse climate–vegetation types. Both models simulated hourly GPP better (median normalized NSE: 0.83 and 0.85) than annual GPP (median normalized NSE: 0.54 and 0.63) for most sites. Specifically, the optimality‐based model substantially improved from NSE of −1.39 to 0.92 when drought stress was explicitly included. Most of the variability in model performances was due to model types and parameterization strategies. The semi‐empirical model produced statistically better hourly simulations than the optimality‐based model, and site–year parameterization yielded better annual model performance. Annual model performance did not improve even when parameterized using . Furthermore, both models underestimated the peaks of diurnal GPP, suggesting that improving predictions of peaks could produce better annual model performance. Our findings reveal current modeling deficiencies in representing IAV of carbon fluxes and guide improvements in further model development.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Bao, Shanning 2
; Koirala, Sujan 3
; Brenning, Alexander 4
; Reichstein, Markus 5
; Tagesson, Torbern 6
; Liddell, Michael 7
; Ibrom, Andreas 8
; Wolf, Sebastian 9
; Šigut, Ladislav 10
; Hörtnagl, Lukas 9
; Woodgate, William 11
; Korkiakoski, Mika 12
; Merbold, Lutz 13
; Black, T. Andrew 14
; Roland, Marilyn 15
; Klosterhalfen, Anne 16
; Blanken, Peter D. 17
; Knox, Sara 18
; Sabbatini, Simone 19
; Gielen, Bert 15
; Montagnani, Leonardo 20
; Fensholt, Rasmus 21
; Wohlfahrt, Georg 22
; Desai, Ankur R. 23
; Paul‐Limoges, Eugénie 24
; Galvagno, Marta 25
; Hammerle, Albin 22
; Jocher, Georg 26
; Reverter, Borja Ruiz 27 ; Holl, David 28
; Chen, Jiquan 29
; Vitale, Luca 30
; Arain, M. Altaf 31
; Carvalhais, Nuno 32
1 Department for Biogeochemical Integration, Max Planck Institute for Biogeochemistry, Jena, Germany, Department of Geography, Friedrich Schiller University Jena, Jena, Germany
2 Department for Biogeochemical Integration, Max Planck Institute for Biogeochemistry, Jena, Germany, National Space Science Center, Chinese Academy of Sciences, Beijing, China
3 Department for Biogeochemical Integration, Max Planck Institute for Biogeochemistry, Jena, Germany
4 Department of Geography, Friedrich Schiller University Jena, Jena, Germany, ELLIS Unit Jena, Jena, Germany
5 Department for Biogeochemical Integration, Max Planck Institute for Biogeochemistry, Jena, Germany, ELLIS Unit Jena, Jena, Germany
6 Department of Physical Geography and Ecosystem Science, Lund University, Lund, Sweden
7 Centre for Tropical, Environmental, and Sustainability Sciences, James Cook University, Cairns, QLD, Australia
8 Department of Environment and Resource Engineering, Technical University of Denmark (DTU), Lyngby, Denmark
9 Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland
10 Global Change Research Institute of the Czech Academy of Sciences, Brno, Czech Republic
11 School of the Environment, The University of Queensland, St Lucia, QLD, Australia, CSIRO, Space and Astronomy, Kensington, WA, Australia
12 Finnish Meteorological Institute, Climate System Research Unit, Helsinki, Finland
13 Integrative Agroecology Group, Agroscope, Zürich, Switzerland
14 Faculty of Land and Food Systems, University of British Columbia, Vancouver, BC, Canada
15 Plants and Ecosystems (PLECO), Department of Biology, University of Antwerp, Wilrijk, Belgium
16 Bioclimatology, University of Göttingen, Göttingen, Germany
17 Department of Geography, University of Colorado Boulder, Boulder, CO, USA
18 Department of Geography, McGill University, Montreal, QC, Canada, Department of Geography, The University of British Columbia, Vancouver, BC, Canada
19 CMCC Foundation ‐ Euro‐Mediterranean Center on Climate Change, Lecce, Italy
20 Faculty of Agricultural, Environmental and Food Sciences, Free University of Bozen‐Bolzano, Bolzano, Italy
21 Department of Geosciences and Natural Resource Management, University of Copenhagen, Copenhagen, Denmark
22 Institut für Ökologie, Universität Innsbruck, Innsbruck, Austria
23 Department of Atmospheric and Oceanic Sciences, University of Wisconsin‐Madison, Madison, WI, USA
24 Swiss Federal Institute for Forest, Snow and Landscape Research (WSL), Birmensdorf, Switzerland
25 Environmental Protection Agency of Aosta Valley, Climate Change Unit, (ARPA Valle d'Aosta), Saint‐Christophe AO, Italy
26 Global Change Research Institute of the Czech Academy of Sciences, Brno, Czech Republic, Thünen Institute of Climate‐Smart Agriculture, Braunschweig, Germany
27 Departamento de Química e Física, Universidade Federal da Paraíba ‐ Campus II, Areia, Brazil
28 Institute of Soil Science, University of Hamburg, Hamburg, Germany
29 Department of Geography, Environment, and Spatial Sciences, Michigan State University, East Lansing, MI, USA
30 Institute for Agriculture and Forestry Systems in the Mediterranean (ISAFoM), Portici, Italy
31 School of Earth, Environment and Society, McMaster University, Hamilton, ON, Canada
32 Department for Biogeochemical Integration, Max Planck Institute for Biogeochemistry, Jena, Germany, ELLIS Unit Jena, Jena, Germany, CENSE, Departamento de Ciências e Engenharia do Ambiente, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal




