Assimilation of carbonyl sulfide (COS) fluxes

Full text

Turn on search term navigation

1 Introduction

Overwhelmingly due to anthropogenic fossil fuel and carbonate emissions, as well as land use and land cover change (Arias et al., 2021), atmospheric carbon dioxide (CO $_{2}$ ) concentrations have increased at an unprecedented rate since the Industrial Revolution, and the global climate has been profoundly affected. As a key component of the Earth system, the terrestrial biosphere has absorbed about 30 % of anthropogenic CO $_{2}$ emissions since 1850 (Friedlingstein et al., 2022). However, in line with large-scale global warming, the structure and function of the terrestrial biosphere have changed rapidly (Grimm et al., 2013; Arias et al., 2021; Moore and Schindler, 2022). As a consequence, terrestrial carbon fluxes are subject to great uncertainty (MacBean et al., 2022).

Terrestrial ecosystem models have been an important tool used to investigate the net effect of complex feedback loops between the global carbon cycle and climate change (Zaehle et al., 2005; Fisher et al., 2014; Fisher and Koven, 2020). Meanwhile, with the advancement of modern observational techniques, a rapidly increasing number of satellite- and ground-based observational datasets have played an important role in studying the spatiotemporal distribution and mechanisms of the terrestrial ecosystem carbon fluxes (Rodell et al., 2004; Quirita et al., 2016). Observations, such as sun-induced chlorophyll fluorescence (Schimel et al., 2015) and soil moisture (Wu et al., 2018), have been used to estimate or constrain carbon fluxes in terrestrial ecosystems (Scholze et al., 2017). Carbonyl sulfide (COS) has emerged as a promising proxy for understanding terrestrial carbon uptake and plant physiology (Montzka et al., 2007; Campbell et al., 2008), since it is taken up by plants through the same pathway of stomatal diffusion as CO $_{2}$ (Goldan et al., 1988; Sandoval-Soto et al., 2005; Seibt et al., 2010) and completely removed by hydrolysis without any back-flux in leaves under normal conditions (Protoschill-Krebs et al., 1996; Stimler et al., 2010).

Plants open/close leaf stomata in order to regulate the water and CO $_{2}$ transit during transpiration and photosynthesis (Daly et al., 2004). As an important probe for characterizing stomatal conductance, COS has shown great potential to constrain plant photosynthesis and transpiration and to improve understanding of the water–carbon coupling (Wohlfahrt et al., 2012; Asaf et al., 2013; Wehr et al., 2017; Kooijmans et al., 2019; Sun et al., 2022; Zhu et al., 2024). A number of empirical or mechanistic COS plant uptake models (Campbell et al., 2008; Wohlfahrt et al., 2012; Berry et al., 2013) and soil exchange models (Kesselmeier et al., 1999; Berry et al., 2013; Launois et al., 2015; Sun et al., 2015; Whelan et al., 2016; Ogée et al., 2016; Whelan et al., 2022) have been developed to simulate COS fluxes in order to more accurately estimate gross primary productivity (GPP), stomatal conductance, transpiration. However, due to the lack of ecosystem-scale measurements of the COS flux (Brühl et al., 2012; Wohlfahrt et al., 2012; Kooijmans et al., 2021), only a few studies were conducted to systematically assess the ability of COS to simultaneously constrain photosynthesis, transpiration, and other related processes in ecosystem models.

Data assimilation is an approach that aims at producing physically consistent estimates of the dynamical behavior of a model by combining information in process-based models and observational data (Liu and Gupta, 2007; Law et al., 2015). It has been widely applied in geophysics and numerical weather prediction (Tarantola, 2005). In the past few decades, substantial efforts have been put into the use of satellite- (Knorr et al., 2010; Kaminski et al., 2012; Deng et al., 2014; Scholze et al., 2016; Norton et al., 2018; Wu et al., 2018) and ground-based (Knorr and Heimann, 1995; Rayner et al., 2005; Santaren et al., 2007; Kato et al., 2013; Zobitz et al., 2014) observational datasets to constrain or optimize the photosynthesis-, transpiration-, and energy-related parameters and variables of terrestrial ecosystem models via data assimilation techniques. More specifically, by applying data assimilation methods to process-based models, not only can the observed dynamics of ecosystems be more accurately portrayed, but our understanding of ecosystem processes can also be deepened with respect to their responses to climate changes (Luo et al., 2011; Keenan et al., 2012; Niu et al., 2014).

In this study, we present the newly developed adjoint-based Nanjing University Carbon Assimilation System (NUCAS v1.0). NUCAS v1.0 is designed to assimilate multiple observational data streams, including COS fluxes, to improve the process-based Biosphere–atmosphere Exchange Process Simulator (BEPS) (Liu et al., 1997), which has been specifically extended for simulating the ecosystem COS flux with the advanced two-leaf model that is driven by satellite observations of leaf area index (LAI).

In this context, the main questions that we aim to answer in this paper are as follows:

What parameters is the COS simulation sensitive to, and how do these parameters change in the assimilation of observed ecosystem-scale COS fluxes?
How effective is the assimilation of COS fluxes in improving the carbon, water, and energy balance for different ecosystems (including evergreen needleleaf forest, deciduous broadleaf forest, C $_{3}$ grass, and C $_{3}$ crop)?
Which processes are constrained by the assimilation of COS fluxes, and what are the mechanisms leading to adjustments of the corresponding process parameters?
How robust is NUCAS when optimizing over a single site and over two sites simultaneously?

To achieve these objectives, COS flux observations across a wide range of ecosystems (including evergreen needleleaf forest, deciduous broadleaf forest, C

_{3}

grass, and C

_{3}

crop) are assimilated into NUCAS to optimize the model parameters using the four-dimensional variational (4D-Var) data assimilation approach, and the optimization results are evaluated against in situ observations. Materials and methods used in our study are described in Sect. 2, which introduces the BEPS model and NUCAS, along with the data used to drive BEPS or for assimilation into NUCAS, and the parameters chosen to be optimized in this study. The results are presented in Sect. 3, including the fit of COS simulations to observations, the variation and impact of parameters on simulated COS, and the comparison and evaluation of model outputs. Section 4 discusses the impacts of the COS assimilation on parameters and processes related to the water–carbon cycle and energy exchange and discusses the influence of uncertain inputs, in particular, impacts of LAI on posterior parameter values. In addition, caveats and implications of assimilating COS flux are summarized. Finally, conclusions are laid out in Sect. 5.

2 Materials and methods

2.1 NUCAS data assimilation system

2.1.1 NUCAS framework

NUCAS is built around the generic satellite-data-driven ecosystem model BEPS and applies the 4D-Var data assimilation method (Talagrand and Courtier, 1987). The BEPS model uses satellite-derived one-sided LAI to drive the phenology dynamics and separates sunlit and shaded leaves in calculating canopy-level energy fluxes and photosynthesis. It further features detailed representations of water and energy processes (Fig. 1). These features render BEPS more advanced in representing ecosystem processes than standard ecosystem models (Richardson et al., 2012) with fewer parameters to be calibrated owing to the LAI-driven phenology.

Figure 1

Schematic of the Nanjing University Carbon Assimilation System (NUCAS). (a) Illustration of a two-leaf model coupling stomatal conductance, photosynthesis, transpiration, and COS uptake and an empirical model for simulating soil COS fluxes in NUCAS. (b) Data assimilation flowchart of NUCAS. Ovals represent input (blue-gray) and output (green) data. The boxes and rhombi represent the calculation and judgment steps. The solid black line represents the diagnostic process, the solid blue line represents the prognostic process, and the input datasets of BEPS (in the dashed box) are used in both the diagnostic process and the prognostic process.

[Figure omitted. See PDF]

Data assimilation is performed in two sequential steps: firstly, an inversion step adjusts the values of parameters controlling photosynthesis, energy balance, hydrology, and soil biogeochemical processes to match the observations. Secondly, the posterior parameters obtained in the first step are used as input data for the second step, in which the BEPS model is re-run to obtain the posterior model variables. The schematic of the system is shown in Fig. 1.

Considering model and data uncertainties, NUCAS implements a probabilistic inversion concept (Talagrand and Courtier, 1987; Tarantola, 1987, 2005) by using Gaussian probability density functions to combine the dynamic model and observations to obtain an estimate of the true state of the system and model parameters (Talagrand, 1997; Dowd, 2007). Hereby, we minimize the following cost function: 1 $J (x) = \frac{1}{2} [{(M (x) - O)}^{T} C_{O}^{- 1} (M (x) - O) + {(x - x_{0})}^{T} C_{x}^{- 1} (x - x_{0})],$ where $O$ and $M$ denote vectors of observations and their modeled counterparts, respectively, and $x$ and $x_{0}$ denote the control parameter vector with current and prior values, respectively. $C_{O}$ and $C_{x}$ denote the uncertainty covariance matrices for observations and prior parameters. Both matrices are diagonal, expressing the assumption that observation uncertainties and parameter uncertainties are independent (Rayner et al., 2005). This definition of the cost function contains both the mismatch between modeled and observed COS fluxes and the mismatch between current and prior parameter values (Rayner et al., 2005).

To determine an optimal set of parameters which minimizes $J$ , a gradient-based optimization algorithm performs an iterative search (Wu et al., 2020). In each iteration, the gradient of $J$ is calculated by applying the adjoint of the model, where the model is run backward to efficiently compute the sensitivity of $J$ and with respect to $x$ (Rayner et al., 2005). The gradient of $J$ is used to define a new search direction. The adjoint model is an efficient sensitivity analysis tool for calculating the parametric sensitivities of complex numerical model systems (An et al., 2016). The computational cost of it is independent of the number of parameters and is in the current case comparable to three to four evaluations of $J$ . In this study, all derivative code is generated from the model code by the automatic differentiation tool TAPENADE (Hascoët and Pascual, 2013). The derivative with respect to each parameter was validated against finite differences of model simulations, which showed agreement within the accuracy of the finite difference approximation. The minimization of the cost function is implemented in a normalized parameter space where the parameter values are measured in multiples of their respective standard deviations with Gaussian priors (Kaminski et al., 2012). The model parameters are the various constants that are not influenced by the model state. Therefore, while they may change between plant function types (PFTs) to reflect different conditions and physiological mechanisms, they will not change in time (Rayner et al., 2005).

2.1.2 BEPS basic model

The BEPS model (Liu et al., 1997; Chen et al., 1999, 2012) is a process-based diagnostic model driven by remotely sensed vegetation data, including LAI, clumping index, and land cover type, as well as meteorological and soil data (Chen et al., 2019). With the consideration of coupling among terrestrial carbon, water, and nitrogen cycles (He et al., 2021), the BEPS model now consists of photosynthesis, energy balance, hydrological, and soil biogeochemical modules (Ju et al., 2006; Liu et al., 2015). It stratifies whole canopies into sunlit and shaded leaves to calculate carbon uptake and transpiration for these two groups of leaves separately (Liu et al., 2015). For each group of leaves, the GPP is calculated by scaling Farquhar's leaf biochemical model (Farquhar et al., 1980) up to canopy level with an updated temporal and spatial scaling scheme (Chen et al., 1999), and the stomatal conductance is calculated using a modified version of the Ball–Berry (BB) model (Ball et al., 1987; Ju et al., 2006). Evapotranspiration (ET) is calculated as the summation of sunlit leaf and shaded leaf transpiration, evaporation from soil and wet canopy, and sublimation from snow storage on the ground surface (Liu et al., 2003). The BEPS model stratifies the soil profile into multiple layers (five were used in this study) and simulates temperature and water content from each layer (Ju et al., 2006). The soil water content is then used to adjust stomatal conductance considering the water stress impacts (Ju et al., 2010; He et al., 2021). Over the last few decades, the BEPS model has been continuously improved and used for a wide variety of terrestrial ecosystems (Schwalm et al., 2010; Liu et al., 2015).

The previous version of BEPS considers a total of 6 PFTs and 11 soil textures (Chen et al., 2012). We use the same soil texture but added 4 PFTs to BEPS in order to better discriminate vegetation types, especially the C $_{4}$ grass and C $_{4}$ crop. Detailed information on these 10 PFTs and 11 soil textures is given in Table S1.

2.1.3 COS modeling

The ecosystem COS flux, $F_{COS, ecosystem}$ , includes both plant COS uptake $F_{COS, plant}$ and soil COS flux exchange $F_{COS, soil}$ (Whelan et al., 2016). In this study, these two components were modeled separately. The canopy-level COS plant uptake $F_{COS, plant}$ (pmol m $^{- 2}$ s $^{- 1}$ ) was calculated by upscaling the resistance analog model of COS uptake, as presented by Berry et al. (2013) with the upscaling scheme recommended by Chen et al. (1999). Specifically, considering the different responses of foliage to diffuse and direct solar radiation (Gu et al., 2002), $F_{COS, plant}$ is calculated as

2 $F_{COS, plant} = F_{COS, sunlit} {LAI}_{sunlit} + F_{COS, shaded} {LAI}_{shaded},$ where ${LAI}_{sunlit}$ and ${LAI}_{shaded}$ are the LAI values (m $^{2}$ m $^{- 2}$ ) of sunlit and shaded leaves, respectively. $F_{COS, sunlit}$ and $F_{COS, shaded}$ are the leaf-level COS uptake rate (pmol m $^{- 2}$ s $^{- 1}$ ) of sunlit and shaded leaves, respectively. The leaf-level COS uptake rate $F_{COS, leaf}$ is calculated as 3 $F_{COS, leaf} = {COS}_{a} \times {(\frac{1.94}{g_{sw}} + \frac{1.56}{g_{bw}} + \frac{1}{g_{COS}})}^{- 1},$ where ${COS}_{a}$ is the COS mole fraction in the bulk air and $g_{sw}$ and $g_{bw}$ are the stomatal conductance and leaf laminar boundary layer conductance to water vapor (H $_{2}$ O), respectively (Berry et al., 2013). The factors 1.94 and 1.56 account for the smaller diffusivity of COS with respect to H $_{2}$ O (Seibt et al., 2010; Stimler et al., 2010). The apparent conductance for COS uptake from the intercellular airspaces is denoted by $g_{COS}$ and combines the mesophyll conductance and the biochemical reaction rate of COS and carbonic anhydrase (CA). Independent studies indicate that both CA activity and mesophyll conductance tend to scale with the photosynthetic capacity or the maximum carboxylation rate of Rubisco (Badger and Price, 1994; Evans et al., 1994), such that 4 $g_{COS} = α \times V_{cmax},$ where $α$ is a parameter that is calibrated to observations of simultaneous measurements of COS and CO $_{2}$ uptake (Stimler et al., 2012). Analysis of these measurements yields estimates of $α$ of $\sim$ 1400 for C $_{3}$ and $\sim$ 7500 for C $_{4}$ species (Stimler et al., 2012; Haynes et al., 2020). According to the COS modeling scheme of the Simple Biosphere Model (version 4.2) (Haynes et al., 2020), $g_{COS}$ can be calculated as 5 $\begin{aligned} g_{COS} & = 1.4 \times 10^{3} \times (1.0 + 5.33 \times F_{C_{4}}) \times 10^{- 6} \\ \times F_{APAR} \times f_{w} \times V_{cmax}, \end{aligned}$ where $F_{C_{4}}$ denotes the C $_{4}$ plant flag, which takes the value of 1 when the vegetation is C $_{4}$ plants and 0 otherwise. $f_{w}$ is a soil moisture stress factor describing the sensitivity of $g_{sw}$ to soil water availability (Ju et al., 2006). $F_{APAR}$ is the scaling factor for leaf radiation, calculated as 6 $F_{APAR} = 1 - e^{(- 0.45 \times LAI)} .$ $F_{COS, soil}$ is taken as the combination of abiotic COS flux $F_{COS, abiotic}$ and biotic COS flux $F_{COS, biotic}$ (Whelan et al., 2016). 7 $F_{COS, soil} = F_{COS, abiotic} + F_{COS, biotic}$ $F_{COS, abiotic}$ is controlled by abiotic degradation of soil organic matter (Whelan and Rhew, 2015) and can be described as an exponential function of the temperature of soil $T_{soil}$ (°C). 8 $F_{COS, abiotic} = e^{(alpha + beta \times T_{soil})},$ where alpha (unitless) and beta (°C $^{- 1}$ ) are parameters determined using the least-squares fitting approach.

$F_{COS, biotic}$ is attributed to CA in microbial communities (Sauze et al., 2017) and calculated according to Behrendt et al. (2014) and Whelan et al. (2016): 9 $F_{COS, biotic} = F_{opt} (\frac{SWC}{{SWC}_{opt}}) \times e^{- a (\frac{SWC}{{SWC}_{opt}} - 1)},$ where $a$ is the curve shape constant, SWC is the soil moisture (percent volumetric water content), and $F_{opt}$ denotes the optimal biotic COS uptake (pmol m $^{- 2}$ s $^{- 1}$ ) at optimum soil moisture SWC $_{opt}$ . The curve shape constant $a$ can be determined based on SWC $_{opt}$ , $F_{opt}$ , and COS flux ( $F_{g}$ ) under another soil moisture condition (SWC $_{g}$ and SWC $_{g}$ $>$ SWC $_{opt}$ ) as follows: 10 $a = ln⁡ (\frac{F_{opt}}{F_{{SWC}_{g}}}) \times {(ln⁡ (\frac{{SWC}_{opt}}{{SWC}_{g}}) + (\frac{{SWC}_{g}}{{SWC}_{opt}} - 1))}^{- 1} .$ Here we use the parameterization scheme of soil COS modeling from Whelan et al. (2016, 2022); see Tables S2 and S3 for details. Specifically, with reference to Abadie et al. (2022) and Whelan et al. (2022), the mean modeled soil water content (SWC) and temperature of the top 9 cm of the soil profile in BEPS were utilized to drive the COS soil model in this study, and the mean modeled SWC and temperature were calculated through a weighted average considering the depth of each soil layer. A more detailed description about the soil hydrology and stomatal conductance modeling approach of BEPS is provided in Appendix A.

2.2 Model parameters

NUCAS v1.0 can optimize 76 parameters belonging to BEPS. Of these parameters, some are global (i.e., the ratio of photosynthetically active radiation to shortwave radiation (f_leaf)) and others are differentiated by PFT (i.e., maximum carboxylation rate of Rubisco at 25 °C ( $V_{cmax 25}$ )) or soil texture class (i.e., Ksat $_{scalar}$ , the scaling factor of saturated hydraulic conductivity (Ksat)). The prior values of the parameters are taken as model defaults which have been tuned in past model development and validation studies (Kattge et al., 2009; Chen et al., 2012). The prior uncertainty of parameters is set based on previous research, i.e., Ryu et al. (2018) and Chen et al. (2022). For a more detailed description of these parameters, see Table S4 in the Supplement.

2.3 Site description

In this study, NUCAS was operated at seven sites distributed over the Eurasian and North American continents characterized as boreal, temperate, and subtropical regions (Fig. 2) based on field observations collected from several studies. These sites were representative of different climate regions and land cover types (in the model represented by PFTs and soil textures, as depicted in Table 1). They contained 4 of the 10 PFTs used in BEPS and 3 of the 11 soil textures. The sites comprise AT-Neu, located at an intensively managed temperate mountain grassland near the village of Neustift in the Stubai Valley, Austria (Hörtnagl et al., 2011; Spielmann et al., 2020a); the Danish ICOS (Integrated Carbon Observation System) research infrastructure site (DK-Sor), which is dominated by European beech (Braendholt et al., 2018; Spielmann et al., 2019a); the Las Majadas del Tiétar site (ES-Lma) located in western Spain with a Mediterranean savanna ecosystem (El-Madany et al., 2018; Spielmann et al., 2019a); the Hyytiälä Forest Station (FI-Hyy) located in Finland and dominated by Scots Pine (Bäck et al., 2012; Vesala et al., 2022); an agricultural soybean field measurement site (IT-Soy) located in Italy (Spielmann et al., 2019a); the Harvard Forest Environmental Monitoring Site (US-Ha1) which is dominated by red oak and red maple in Petersham, Massachusetts, USA (Urbanski et al., 2007; Wehr et al., 2017); and the Wind River Experimental Forest site (US-Wrc), located within the Gifford Pinchot National Forest in southwestern Washington state, USA, with 478 ha of preserved old-growth evergreen needleleaf forest (Rastogi et al., 2018a). For further information on all sites, see publications listed in Table 1.

Figure 2

Locations of the seven studied sites. Sites sharing the same plant function type are represented with consistent colors. The background map corresponds to the “Nature Earth I” map (https://www.naturalearthdata.com, last access: November 2023). ENF and DBF denote evergreen needleleaf forest and deciduous broadleaf forest, respectively.

[Figure omitted. See PDF]

Table 1

Site characteristics. Site identification includes the country initials and a three-letter name for each site. Locations of the sites are provided by the latitude (Lat) and longitude (Long). PFTs covered by the sites are evergreen needleleaf forest (ENF), deciduous broadleaf forest (DBF), C $_{3}$ grass, and C $_{3}$ crop. Soil textures covered by the sites are sandy loam, slit loam, and loam.

Site name	AT-Neu	DK-Sor	ES-Lma	FI-Hyy	IT-Soy	US-Ha1	US-Wrc
Lat (° N)	47.12	55.49	39.94	61.85	45.87	42.54	45.82
Long (° E)	11.32	11.64	$- 5.77$	24.29	13.08	$- 72.17$	$- 121.95$
PFT	C $_{3}$ grass	DBF	C $_{3}$ grass	ENF	C $_{3}$ crop	DBF	ENF
Soil texture	Sandy loam	Sandy loam	Sandy loam	Sandy loam	Slit loam	Sandy loam	Loam
LAI*	3.88	5.0	1.82	4.0	2.3	5.0	8.7
Year	2015	2016	2016	2013–2017	2017	2012–2013	2014
References	Spielmann et al. (2020)	Spielmann et al. (2019)	Spielmann et al. (2019)	Sun et al. (2018);Vesala et al. (2022);Kohonen et al. (2022)	Spielmann et al.(2019); Abadie et al.(2022)	Commane et al. (2015); Wehr et al. (2017)	Shaw et al. (2004);Rastogi et al. (2018)

$^{*}$ Mean one-sided LAI (m $^{2}$ m $^{- 2}$ ) during the experimental period.

2.4 Data

NUCAS was driven by several temporally and spatially variant and invariant datasets. The CO $_{2}$ and COS mole fractions in the bulk air were assumed to be spatially invariant over the globe and to vary annually. The CO $_{2}$ mole fraction data in this study are taken from the Global Monitoring Laboratory (https://gml.noaa.gov/ccgg/trends/global.html, last access: July 2022). For the COS mole fraction, the average of the COS mole fraction observations from sites SPO (South Pole) and MLO (Mauna Loa, United States) was utilized to drive the model, and the data are publicly available online at https://gml.noaa.gov/hats/gases/OCS.html (last access: July 2022). The other input data include a remotely sensed LAI dataset, a meteorological dataset, and a soil dataset. Additionally, in order to conduct data assimilation experiments and to evaluate the effectiveness of the assimilation of COS fluxes, field observations including the ecosystem-scale (eddy-covariance or gradient-based) COS flux, GPP, sensible heat ( $H$ ), ET, and SWC collected at the sites were used.

2.4.1 LAI dataset

The LAI dataset used here comprises the GLOBMAP global leaf area index product (Version 3) (see GLOBMAP global leaf area index since 1981; Zenodo (https://doi.org/10.5281/zenodo.4700264, Liu et al., 2021)), the Global Land Surface Satellite (GLASS) LAI product (Version 3) (acquired from https://doi.org/10.12041/geodata.GLASS_LAI_MODIS(0.05 D).ver1.dbTS1, Xiao et al., 2016), and the level-4 MODIS global LAI product (see LP DAAC – MOD15A2H (https://doi.org/10.5067/MODIS/MOD15A2H.006, Myneni et al., 2015)). The GLOBMAP LAI product quantifies leaf area index at a spatial resolution of 8 $\times$ 8 km and a temporal resolution of 8 d (Liu et al., 2012). The GLASS LAI product is generated every 8 d at a spatial resolution of 1 $\times$ 1 km (Xiao et al., 2016), and the MODIS LAI is an 8 d composite dataset with 500 $\times$ 500 m pixel size. As a default, we used GLOBMAP products for assimilation experiments as much as possible given their good performance in the BEPS applications to various cases (Chen et al., 2019). The GLASS and MODIS LAI products were used to investigate the effect of the LAI products on the parameter optimization results. Also, according to Spielmann et al. (2019a), the GLOBMAP product had considerably underestimated the LAI at the DK-Sor site in June 2016, and we noticed it was not consistent with the vegetation phenology at ES-Lma in May 2016. Therefore, GLASS LAI was used at these two sites, and the GLOBMAP product was used at the remaining five sites. In addition, the 8 d temporal resolution of the LAI data was interpolated into daily values using the nearest neighbor method.

2.4.2 Meteorological dataset

Standard hourly meteorological data were inputted into BEPS, including air temperature at 2 m, shortwave radiation, precipitation, relative humidity, and wind speed, taken from the FLUXNET database (for sites AT-Neu, DK-Sor, ES-Lma, FI-Hyy, and US-Ha1; see https://fluxnet.org, last access: June 2022), the AmeriFlux database (for sites US-Ha1 and US-Wrc; see https://ameriflux.lbl.gov, last access: June 2022), and the ERA5 dataset (for sites AT-Neu, IT-Soy, and US-Ha1; see 10.24381/cds.adbb2d47, Hersbach et al., 2023), respectively. Since the experiments were conducted at the site scale, we used as far as possible the FLUXNET and AmeriFlux data, which contain information about the downscaling of meteorological variables of the ERA-Interim reanalysis data product, and supplemented them with ERA5 reanalysis data (Pastorello et al., 2020). Although AT-Neu is a FLUXNET site, its FLUXNET meteorological data are only available for the years 2002–2012, while the measurement of COS was performed in 2015. Therefore, we first performed a linear fit of its ERA5-Land data and FLUXNET meteorological data for 2002–2012 and then corrected the ERA5 data for 2015 with the fitted parameters to obtain downscaling information for the meteorological variables. Additionally, for US-Ha1, we used the FLUXNET data in 2012 and the AmeriFlux data and ERA5 shortwave radiation data in 2013 to drive the BEPS model due to the absence of FLUXNET data in 2013 and the lack of shortwave radiation data of AmeriFlux.

2.4.3 Assimilation and evaluation datasets

The hourly ecosystem-scale COS flux observations were used to perform data assimilation experiments and to evaluate the assimilation results. They were taken from existing studies (listed in Table 1) and were available for at least 1 month. Most of the ecosystem COS flux observations were obtained using the eddy-covariance (EC) technique, with the exception of US-Ha1 and US-Wrc, where the COS fluxes were derived with the gradient-based approach (Baldocchi, 2003; Wu et al., 2015; Kohonen et al., 2020). The COS soil flux measurements were collected using soil chambers, except at US-Ha1, where the gradient-based approach was used. Detailed information about the COS measurements can be found in the publications listed in Table 1. Specifically, only the measured ecosystem COS fluxes of FI-Hyy (Vesala et al., 2022) were utilized in this study.

US-Wrc utilizes the gradient-based approach to measure COS ecosystem flux (Rastogi et al., 2018a). However, available data are limited to only COS concentration measurements and lack other parameters required; therefore this site risks introducing biases. Hence, a bias correction scheme was implemented to match the simulated and estimated ecosystem-scale COS fluxes for US-Wrc. The objectives of this correction scheme are to obviate the need for accurate values of parameters relevant for COS flux calculations and to retain as much useful information from the COS concentration measurements as possible (Leung et al., 1999; Scholze et al., 2016). This was done by using the mean ( $\overline{M}$ ) and standard deviation ( $σ_{M}$ ) of the simulated COS flux to correct the COS flux observations ( $O$ ):

11 $F = \frac{σ_{M} (O - \overline{O})}{σ_{O}} + \overline{M},$ where $\overline{O}$ and $σ_{O}$ are the mean and standard deviation of the observed COS flux series. $F$ is the corrected observed COS flux, and the COS simulations were calculated using the prior parameters for the time period corresponding to the COS flux observations. The standard deviation of the ecosystem COS fluxes within 24 h around each observation was calculated as an estimate of the observation uncertainty. For the case where there are no other observations within the surrounding 24 h, the uncertainty was taken as the mean of the estimated uncertainties of the whole observation series.

Due to the coupling between leaf exchange of COS, CO $_{2}$ and H $_{2}$ O, GPP, and ET data are selected to evaluate the model performance of COS assimilation in this study. In addition, we further explored the ability of COS to constrain $H$ simulations, since the transpiration contributes to a decrease in temperature within the leaf (Gates, 1968; Konarska et al., 2016), and the leaf–air temperature gradient is a key control factor of $H$ (Monteith and Unsworth, 2013; Dong et al., 2017). Moreover, SWC is used in model evaluation because it plays a key role in modeling $F_{COS, biotic}$ (as shown in Eq. 9) and because the water dissipated in transpiration originates from the soil (Berry et al., 2016). More detail will be provided in the Discussion (Sect. 4.3).

These data were taken from FLUXNET (DK-Sor, ES-Lma, FI-Hyy, and US-Ha1), AmeriFlux (US-Ha1 and US-Wrc), existing studies (Spielmann et al., 2020a, 2019a, and Rastogi et al., 2018a), and SMEAR (https://smear.avaa.csc.fi/, last access: May 2024). As only CO $_{2}$ turbulent flux (FC) data are available for US-Ha1 in 2013 and only net ecosystem exchange (NEE) data are available for IT-Soy, a night flux partitioning model was used to estimate ecosystem respiration ( $R_{eco}$ ) and thus to calculate GPP (Reichstein et al., 2005). The model assumes that nighttime NEE represents ecosystem respiration and thus partitions FC or NEE into GPP and $R_{eco}$ based on the semi-empirical models of respiration, which use air temperature as a driver (Lloyd and Taylor, 1994; Lasslop et al., 2012). Since ET observations are only available from FI-Hyy at https://smear.avaa.csc.fi/ (last access: May 2024), ET is derived from latent heat ( $LE$ ) as the ratio of $LE$ to the latent heat of vaporization ( $L_{w}$ ) (Pastorello et al., 2020). In this study, we use air temperature as a driver to calculate $L_{w}$ and subsequently ET (Bolton, 1980).

We hereby note that only the comparisons of COS and GPP results before and after assimilation are presented in the main text, while the evaluations of the simulated ET (Figs. S3 and S4), $H$ (Figs. S5 and S6), and SWC (Fig. S7) are included in the Supplement.

2.5 Experimental design

Three groups of data assimilation experiments were conducted in this study: (1) 14 model-based twin experiments were performed to investigate the ability of NUCAS to assimilate COS fluxes in different scenarios; (2) 13 single-site assimilation experiments were conducted at all seven sites to obtain the site-specific posterior parameters and the corresponding posterior model outputs based on COS flux observations; and (3) 1 two-site assimilation experiment was carried out to refine one set of parameters over two sites simultaneously and to simulate the corresponding model outputs. Prior simulations using default parameters were also performed in order to investigate the effect of the COS flux assimilation. Moreover, due to the limitation of the COS observations, all of these experiments were conducted in a 1-month time window at the peak of the growing season. Detailed information of these experiments is described in the following.

2.5.1 Twin experiment

Model-based twin experiments were performed to investigate the model performance of the data assimilation (Irrgang et al., 2017) at all seven sites considering single-site and two-site scenarios. In each twin experiment, we first created a pseudo-observation sequence by NUCAS using the prior parameters. The pseudo-observation time series included the prior simulated ecosystem COS fluxes with its uncertainties, and the latter were estimated as the standard deviation of the prior simulated COS fluxes within 24 h around each simulation. Then, a given perturbation ratio was applied to the prior parameters vector as a starting point for the interactive adjustment of parameter values to match the COS flux pseudo-observations. The effectiveness of the data assimilation methodology of NUCAS can be validated if it successfully restores the control parameters from the pseudo-observations. As a gradient-based optimization algorithm is used in NUCAS to tune the control parameters and minimize the cost function, the changes of cost function and gradient over assimilation processes can also be used to verify the assimilation performance of the system. In this work, a total of 14 twin experiments were conducted, including 13 single-site twin experiments and 1 two-site twin experiment. Regarding the uncertainty of parameters, a perturbation size of 0.2 was utilized in all of the twin experiments.

2.5.2 Real data assimilation experiment

After the ability of NUCAS to assimilate COS fluxes was confirmed by twin experiments, the system was then utilized to conduct data assimilation experiments with real COS observations under single-site and two-site conditions to optimize the control parameters and state variables of this model, and the evaluation dataset was used to test the posterior simulations of the state variables. For the single-site case, a total of 13 data assimilation experiments were conducted at all sites to investigate the assimilation effect of COS flux on optimizing key ecosystem variables. Detailed information about these single-site experiments is shown in Table 2.

Table 2

The configuration and the relative changes (%) of the parameters for each single-site assimilation experiment. The cost function reduction of each experiment is indicated by the reduction rate between the initial value of the cost function ( $J_{initial}$ ) and the final value of the cost function ( $J_{final}$ ), defined as $1 - J_{final} / J_{initial}$ , and $N_{COS}$ denotes the number of ecosystem COS flux observations.

Site name	Assimilation window	$N_{COS}$	Cost function reduction (%)	Relative change (%) of parameters
				$V_{cmax 25}$	VJ_slope	Ksat $_{scalar}$	$b_{scalar}$	f_leaf
AT-Neu	June 2015	493	16.39	67.69	5.10	15.57	$-$ 78.13	$-$ 1.01
DK-Sor	June 2016	509	9.46	50.77	$-$ 0.47	21.54	14.23	$-$ 5.97
ES-Lma	May 2016	445	15.70	127.80	35.18	37.08	$-$ 65.33	10.05
FI-Hyy	July 2013	506	5.88	25.50	$-$ 65.70	0.37	4.25	$-$ 7.89
	July 2014	504	20.17	$-$ 24.96	$-$ 26.39	3.82	16.24	$-$ 6.12
	August 2014	166	38.86	$-$ 24.84	$-$ 56.81	7.79	4.46	$-$ 1.52
	July 2015	492	5.53	6.43	$-$ 50.25	0.01	$-$ 0.06	0.26
	July 2016	430	4.37	11.47	$-$ 53.16	$-$ 0.17	$-$ 0.63	$-$ 0.37
	July 2017	527	2.84	21.70	$-$ 51.74	0.01	0.01	$-$ 6.98
IT-Soy	July 2017	250	4.87	32.55	13.15	21.60	$- 44.72$	$- 0.94$
US-Ha1	July 2012	333	44.14	$-$ 51.89	16.08	12.05	$-$ 43.31	$-$ 1.44
	July 2013	397	63.73	$-$ 58.67	10.16	16.93	$-$ 58.33	$-$ 1.71
US-Wrc	August 2014	701	27.71	$-$ 42.77	14.52	$-$ 1.04	2.45	$-$ 3.39

Single-site assimilation can fully account for the site-specific information and thus achieve accurate calibration. However, this assimilation approach often yields a range of different model parameters between sites. For large-scale model simulations, only one set of accurate and generalized model parameters is required (Salmon et al., 2022). Thus, it is necessary to conduct a two-site assimilation experiment that can assimilate COS observations from two sites simultaneously. Although both DK-Sor and US-Ha1 are dominated by deciduous broadleaved forest and both AT-Neu and ES-Lma are dominated by C $_{3}$ grass, none of the COS flux observations from these two PFTs overlap in observation time. We therefore selected FI-Hyy and US-Wrc, which are both dominated by evergreen needleleaf forest, and conducted a two-site assimilation experiment with a 1-month assimilation window in August 2014.

2.6 Model evaluation

For the purpose of demonstrating the process of the control parameter vector being continuously adjusted in the normalized parameter space in a twin experiment and quantifying the deviation of the current control vector from the prior, the distance ( $D_{x}$ ) between the parameter vector and the prior parameter vector was calculated.

12 $D_{x} = ∥x - x_{0}∥ = \sqrt{\sum_{i = 1}^{n} {(x (i) - x_{0} (i))}^{2}},$ where $i$ denotes the $i$ th parameter in the parameter vectors and $n$ denotes the number of parameters in the parameter vector and takes a value of 76.

With the aim of evaluating the performance of NUCAS in the real data assimilation experiments, we re-ran the model to obtain the posterior model outputs based on the posterior model parameters. Typical statistical metrics including mean bias (MB), root-mean-square error (RMSE), and coefficient of determination ( $R^{2}$ ) are used to measure the difference between the simulations and in situ observations. They were calculated as $\begin{matrix} 13 & MB = \frac{1}{N} \sum_{i = 1}^{N} (M_{i} - O_{i}) = \overline{M} - \overline{O} \\ 14 & RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(M_{i} - O_{i})}^{2}} \\ 15 & R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(M_{i} - O_{i})}^{2}}{\sum_{i = 1}^{N} {(O_{i} - \overline{O})}^{2}}, \end{matrix}$ where $M_{i}$ denotes the simulation corresponding to the $i$ th observation $O_{i}$ and $N$ is the total number of observations.

Additionally, in order to investigate the sensitivity of COS assimilation to the model parameters, we also calculated the sensitivity index (SI) for each parameter at the prior value based on the sensitivity information provided by the adjoint model. The SI of $i$ th parameter $x (i)$ of the parameter vector $x$ was calculated as 16 $SI (x (i)) = \frac{\partial J / \partial x (i)}{∥\partial J / \partial x∥},$ where $∥\partial J / \partial x∥$ denotes the norm of the sensitivity vector of the cost function to the model parameters.

3 Results

3.1 Twin experiments

After averaging about 18 and 13 evaluations of the cost function and its gradients, each of the twin experiments was successfully performed. Details of these twin experiments are shown in Table S5. In summary, during those assimilations, the cost function values were substantially reduced by more than 13 orders of magnitude from greater than 50.75 to less than $5.09 \times 10^{- 13}$ , and the respective gradient values were also reduced from greater than 38.81 to less than $1.56 \times 10^{- 6}$ , which verified the ability of the data assimilation algorithm to correctly complete the assimilation.

The relative changes of the parameters with respect to the prior values at the ends of the experiments are reported in Table S5, along with the initial values ( $D_{initial}$ ) and the final values ( $D_{final}$ ) of $D_{x}$ . Results show that the relative differences of these parameters from the “true” values reached exceedingly small values at the ends of twin experiments, with the maximum of the absolute values of the relative changes below $8.55 \times 10^{- 9}$ . $D_{x}$ was also reduced to nearly zero, where the maximum value was below $6.60 \times 10^{- 8}$ , which indicates that all parameters in the control parameter vectors were almost fully recovered from the pseudo-observations. In conclusion, these results demonstrate that NUCAS has excellent data assimilation capability under various scenarios and can effectively perform iterative computations to achieve reliable parameter optimization.

3.2 Single-site assimilation

With an average of approximately 97 cost function evaluations, all of the 13 single-site experiments were performed successfully. The experiments reduced cost function values substantially, with an average cost function reduction of 19.97 % (Table 2). However, the cost function reduction of the experiment varies considerably with PFT, site, and assimilation window, ranging from 2.84 % to 63.73 %. The cost function decreased dramatically at US-Ha1, with an average decrease of 53.93 %. In contrast, at IT-Soy, the cost function reduction is only 4.87 %. With the same PFT (C $_{3}$ grass), the cost function decreased by a similar degree at AT-Neu (16.39 %) and ES-Lma (15.70 %). The average cost function reduction at FI-Hyy (29.52 %) was also comparable to another evergreen needleleaf forest site, US-Wrc (27.71 %), in 2014. However, the cost function reduction of FI-Hyy varied notably from year to year. In July 2014 and August 2014, the cost function reductions were 20.17 % and 38.86 %, respectively, while in July of all other years the cost function reductions were much lower, ranging from 2.84 % to 5.88 %. Similarly to the single-site twin experiments, only five parameters have been efficiently adjusted in the single-site assimilation of real observations (Table 2).

The mean diurnal cycle and the scatterplots of observed and simulated COS fluxes are presented in Figs. 3 and S1, respectively. On average across all sites, the prior simulated and observed ecosystem COS fluxes were remarkably close, with 20.60 and 20.04 pmol m $^{- 2}$ s $^{- 1}$ , respectively. However, there was substantial variability between sites and even between experiments at the same site. At ES-Lma, the prior simulated COS fluxes were greatly underestimated by 63.38 %. In contrast, the prior simulated COS fluxes were overestimated at US-Ha1, with MBs of $- 10.01$ and $- 12.17$ pmol m $^{- 2}$ s $^{- 1}$ in July 2012 and July 2013. In general, the MBs of COS fluxes are largely determined by the simulations and observations in the daytime due to the larger magnitude (Fig. 3). However, the model–observation differences at nighttime are also non-negligible. As shown in Fig. 3, the underestimation is particularly evident at AT-Neu, ES-Lma, and FI-Hyy.

Figure 3

The diurnal cycle of observed (gray) and simulated COS flux using prior parameters (red) and single-site (blue) and two-site (green) posterior parameters. The size of the circle indicates the number of observations (ranging from 1 to 31) within each circle, and the error bars depict the standard deviations in the mean of observations from the variability within each circle if the number of corresponding observations is greater than 3. Lines connect the mean values of simulations, and pale bands depict the standard deviation in the mean of simulations from the variability within each bin.

[Figure omitted. See PDF]

After the single-site optimizations, both the daily variation and the diurnal cycle of COS simulations were improved. This was reflected in the reduction of the mean RMSE between the simulated and observed COS fluxes from 15.71 pmol m $^{- 2}$ s $^{- 1}$ in the prior case to 13.84 pmol m $^{- 2}$ s $^{- 1}$ in the posterior case. The RMSEs were also reduced in all single-site experiments. Moreover, the assimilation of COS fluxes also effectively corrected the bias between prior simulations and observations, with the mean absolute MB decreased from 5.06 to 3.15 pmol m $^{- 2}$ s $^{- 1}$ . In contrast, $R^{2}$ remained almost unchanged by the optimizations, with its mean value of 0.30 in both the prior and the posterior cases. Our results also showcase that the model–observation differences in COS fluxes were effectively reduced during the daytime. However, the remarkable differences between COS flux observations and simulations at nighttime are not effectively corrected in a number of assimilation experiments (i.e., the experiment conducted at FI-Hyy in July 2013; see Fig. 3d).

3.3 Two-site assimilation

FI-Hyy and US-Wrc have different soil textures: sandy loam and loam, respectively. In the two-site assimilation experiment, NUCAS accounted for this difference appropriately and successfully minimized the cost function from 499.56 to 358.81 after 70 evaluations of cost function. The cost function reduction for the experiment has a value of 28.17 %, comparable to the cost function reductions for corresponding single-site assimilation experiments at FI-Hyy and US-Wrc (38.86 % and 27.71 %). Furthermore, corresponding to these two soil textures, the texture-dependent parameters Ksat $_{scalar}$ and $b_{scalar}$ yielded two different posterior parameter values, respectively, so that a total of seven parameters were optimized in the two-site experiment (Table 3). It can be seen that the two-site optimized results of $V_{cmax 25}$ , VJ_slope, and f_leaf are similar to those of the single-site optimized results at US-Wrc, as most of the observations of the two-site experiment originated from US-Wrc. As for the texture-dependent parameters, they had the same signs and comparable magnitudes of the adjustments as those of the corresponding single-site experiment at FI-Hyy and were minutely adjusted at US-Wrc as in the corresponding single-site experiment. Overall, both the cost function reduction and the parameter optimization results of the two-site assimilation experiments were similar to the corresponding single-site experiments, demonstrating the ability of NUCAS to correctly perform joint data assimilation from COS observations at two sites simultaneously.

Table 3

The configuration and the relative changes (%) of the parameters for the two-site assimilation experiment at FI-Hyy and US-Wrc. $N_{COS}$ denotes the total number of ecosystem COS flux observations.

Site name	Assimilation window	$N_{COS}$	Cost function reduction (%)	Relative change (%) of parameters
				$V_{cmax 25}$	VJ_slope	Ksat $_{scalar}$	$b_{scalar}$	f_leaf
FI-Hyy	August 2014	867	28.17	$- 41.74$	3.36	12.57	5.57	$- 6.81$
US-Wrc	August 2014	867	28.17	$- 41.74$	3.36	$- 1.91$	2.75	$- 6.81$

The posterior simulations of COS flux using the two-site posterior parameters also demonstrated the ability of NUCAS to correctly assimilate two-site COS fluxes simultaneously (Figs. 3 and S1). As shown in Figs. 3f and 3m, the prior COS simulations for both the FI-Hyy site and US-Wrc site were overestimated in the daytime compared to the observations. After the two-site COS assimilation, the discrepancies between COS simulations and observations were reduced in both FI-Hyy and US-Wrc, with RMSE reductions of 18.42 % and 3.23 %, achieving similar results to the simulations using the single-site posterior parameters.

3.4 Parameter change

There are five parameters that were adjusted during the assimilation of COS flux observations by NUCAS, whether in twin, single-site, or two-site experiments. They are the maximum carboxylation rate at 25 °C ( $V_{cmax 25}$ ), the ratio of $V_{cmax}$ to the maximum electron transport rate $J_{\max}$ (VJ_slope), the scaling factor (Ksat $_{scalar}$ and $b_{scalar}$ ) of saturated hydraulic conductivity (Ksat) and the Campbell parameter (b), and the ratio of PAR to shortwave radiation (f_leaf). These parameters are strongly linked to the COS exchange processes, and it is therefore reasonable that they could be optimized by the assimilation of COS flux. Furthermore, these parameters are also closely linked to processes such as photosynthesis, transpiration, and soil water transport; therefore, the assimilation of COS flux holds the promise of improving the simulation of GPP, ET, $H$ , and SWC.

In both the single-site and the two-site experiments, $V_{cmax 25}$ was considerably adjusted, with average absolute relative changes of 42.08 % and 41.74 %, respectively (Fig. 4a). VJ_slope and $b_{scalar}$ also varied greatly in the single-site experiments, with mean absolute relative changes of 30.67 % and 25.55 %, respectively. However, in the two-site experiment, their mean absolute changes were much smaller at 3.36 % and 4.16 %. The relative changes of Ksat $_{scalar}$ are modest in both the single-site and two-site experiments, with mean absolute values of 10.61 % and 7.24 %, respectively. As for f_leaf, the average absolute relative changes are even smaller than those of Ksat $_{scalar}$ , at 3.67 % and 6.81 % in the single-site and two-site experiments. In addition, we found that the parameters can be tuned considerably in cases where the prior simulations are close to the observations. For example, at IT-Soy, where the prior simulations agree well with the observations and the cost function only decreases 4.87 % in the experiment, both $V_{cmax 25}$ and $b_{scalar}$ were remarkably tuned, with relative changes of 32.55 % and $- 44.72$ %.

Figure 4

(a) Relative changes of parameters for single-site experiments (bars) and for the two-site experiment (diamond points). (b) Sensitivity indexes of parameters at prior values. For sites where multiple single-site experiments were conducted, the ends of the error bars and the bar indicate the maximum, minimum, and mean of the relative changes of the parameters, respectively. For those sites lacking multi-year COS observations, no error bars were plotted. The color of each bar indicates the PFT/texture (Txt).

[Figure omitted. See PDF]

Across all single-site experiments, there are notable differences in the results of parameter optimization, especially in $V_{cmax 25}$ . For the single-site experiment at US-Ha1 in July 2013, the posterior value of $V_{cmax 25}$ is 55.28 % lower than the prior. In contrast, the posterior $V_{cmax 25}$ is 127.80 % higher than the prior at ES-Lma. In addition to $V_{cmax 25}$ , the relative changes of $b_{scalar}$ and VJ_slope also vary considerably, ranging from $- 78.13$ % to 16.84 % and $- 65.70$ % to 35.18 %, respectively. On the contrary, the posterior values of f_leaf show less variability and do not differ from the prior value by more than 10.05 % (note the difference in $x$ -axis scales).

3.5 Parameter sensitivity

The adjoint-based sensitivity analysis results of the parameters are illustrated in Fig. 4b. Our results suggest that $V_{cmax 25}$ has a critical impact on the assimilation results, followed by VJ_slope. With absolute SIs ranging from 87.76 % to 96.41 %, the mean absolute SI of $V_{cmax 25}$ is about 3 times that of VJ_slope, which is 29.71 %. In contrast, the average absolute SIs of $b_{scalar}$ , f_leaf, and Ksat $_{scalar}$ are much lower, with 11.54 %, 8.95 % and 3.05 %, respectively.

Unlike the great variability in the posterior $V_{cmax 25}$ and VJ_slope, the SIs of these two parameters are stable, especially at the same site. At US-Ha1, for example, the differences between the SIs of $V_{cmax 25}$ and VJ_slope in its two experiments were all smaller than 3.05 %. Furthermore, $V_{cmax 25}$ has the smallest magnitude of variation in SIs among the five parameters, with the standard deviation of the SIs being 2.62 %, despite the fact that its SIs are of a much larger order of magnitude. With the SIs ranging from 12.05 % to 45.71 % and 0.94 % to 14.43 %, VJ_slope and f_leaf also play important roles in the modeling of COS. As for Ksat $_{scalar}$ and $b_{scalar}$ , their SIs varied considerably across sites and even across experiments at the same site. For example, the absolute SIs of $b_{scalar}$ are as high as 30.80 % and 34.04 % at the C $_{3}$ grass sites AT-Neu and ES-Lma, respectively. On the contrary, the mean absolute SI of $b_{scalar}$ is only 2.59 % at FI-Hyy. However, the absolute SIs of $b_{scalar}$ at FI-Hyy vary considerably across the experiments, ranging from 0.06 % to 10.46 %.

Our results also suggest that f_leaf tends to play a more important role in the COS assimilation at the forest sites (including FI-Hyy, US-Ha1, and US-Wrc but not DK-Sor) compared to the sites with low-stature vegetation type (AT-Neu, ES-Lma, and IT-Soy), with the mean absolute SIs about 2 times those of the latter. With a mean absolute SI of 93.44 %, $V_{cmax 25}$ is also observed to be more sensitive at the forest sites. Specifically, the largest SI of $V_{cmax 25}$ was observed at DK-Sor, while the SIs of VJ_slope and f_leaf of DK-Sor are noticeably lower than those of other sites, at 12.05 % and 0.94 %, respectively.

3.6 Comparison and evaluation of simulated GPP

For single-site experiments, both the prior and posterior GPP simulations performed well in modeling the daily variation and diurnal cycle of GPP, with mean $R^{2}$ values of 0.83 and 0.81, respectively (Figs. 5 and S2). The discrepancy between GPP simulations and observations was substantially reduced by the assimilation of COS, from a mean RMSE of 6.71 umol m $^{- 2}$ s $^{- 1}$ in the prior case to 5.02 umol m $^{- 2}$ s $^{- 1}$ in the posterior case. Similarly to COS, the mean of the prior simulated GPP is also generally larger than that of the observed GPP. With the assimilation of COS, the bias between the observed and simulated GPP was effectively corrected, with the reduction in mean absolute MB from 3.83 to 2.40 umol m $^{- 2}$ s $^{- 1}$ .

Figure 5

The diurnal cycle of observed (gray) and simulated GPP using prior parameters (red) and single-site (blue) and two-site (green) posterior parameters. The size of the circle indicates the number of observations within each circle (ranging from 1 to 31), and the error bars depict the standard deviations in the mean of observations from the variability within each circle. Lines connect the mean values of simulations, and pale bands depict the standard deviation in the mean of simulations from the variability within each bin.

[Figure omitted. See PDF]

In general, the GPP performance was improved for most of the single-site experiments (12 of 13), with RMSE reductions ranging from 3.81 % to 58.56 %. Across all single-site experiments performed at evergreen needleleaf forest sites, the posterior GPP simulations were remarkably improved, with an averaged RMSE reduction of 37.05 %. At the deciduous broadleaf forest sites (DK-Sor and US-Ha1), the posterior simulated GPP also achieved a better fit with the GPP derived from EC observations, with an averaged RMSE reduction of 22.16 %. However, for experiments conducted on low-stature vegetation types (including C $_{3}$ grass and C $_{3}$ crop), the assimilation of COS is less effective in constraining the modeled GPP. At ES-Lma and IT-Soy, the RMSEs of the posterior simulated GPP are slightly lower than those of the prior, with reduction ratios of 8.60 % and 3.81 %, respectively. At AT-Neu, the assimilation of COS observations shifted the GPP simulations away from the GPP derived from EC observations, with the RMSE increasing from 3.48 to 5.97 umol m $^{- 2}$ s $^{- 1}$ (Fig. 5a).

Covering different years or months, the single-site experiments performed at FI-Hyy and US-Ha1 provided an opportunity to analyze inter-annual and seasonal variations in the simulated and observed GPP. At US-Ha1, the prior simulations overestimated GPP in both July 2012 and July 2013 by 21.26 % and 38.41 %, respectively. With the assimilation of COS, the modeled COS exhibited substantial decreases. In parallel, the model–observation difference in GPP also reduced by 12.36 % and 28.10 %, respectively. However, the posterior simulated GPP appeared to be underestimated by 20.08 %. At FI-Hyy, a total of six single-site experiments were conducted between 2013 and 2017, five of them in July and one in August 2014. The observed GPP shows little inter-annual variation in July from 2013 to 2017, with the mean ranging from 8.30 to 9.15 umol m $^{- 2}$ s $^{- 1}$ . In August 2014, the GPP observations were noticeably lower than that in July, with a mean of 6.43 umol m $^{- 2}$ s $^{- 1}$ . As for simulations, the model tends to overestimate GPP, with MBs ranging from 2.24 to 3.59 umol m $^{- 2}$ s $^{- 1}$ . After the assimilation of COS, the overestimation of the COS simulation for FI-Hyy was effectively corrected, with mean absolute MBs of 1.53 umol m $^{- 2}$ s $^{- 1}$ . However, with a low SWC in August 2014, the prior simulated COS was obviously overestimated by 37.04 %, which led to remarkable downward adjustments of $V_{cmax 25}$ and VJ_slope. Thus, the simulated GPP was also markedly downgraded by 55.38 % in August 2014, ultimately resulting in the underestimation of the single-site posterior simulated GPP (Fig. 5f).

In the two-site experiment, the model–observation differences in GPP for both FI-Hyy and US-Wrc were reduced by the assimilation of COS (Fig. 5f and m), with RMSE reductions of 42.96 % and 43.11 %, respectively. These RMSE reductions are even higher than those in the corresponding single-site experiments, by 35.21 % for FI-Hyy and 0.13 % for US-Wrc. These results suggest that simultaneous assimilation using COS observations from two sites can also improve GPP simulations, and the assimilation can be more robust than the single-site assimilation because the possibility of overfitting local noise is reduced.

Overall, the assimilation of ecosystem COS fluxes improved the simulation of GPP both in single-site experiments and in the two-site experiment. However, the assimilation effects vary considerably for different sites and even for different periods within the same site. Our results suggest the assimilation of COS is able to provide strong constraint to the modeling of GPP at forest sites, with an average RMSE reduction of 32.58 %. In contrast, at the sites with low-stature vegetation type (including C $_{3}$ grass and C $_{3}$ crop), the assimilation of COS is less effective in constraining the GPP simulations.

4 Discussion

4.1 Parameter changes

As mentioned before, our results show $V_{cmax 25}$ was tuned the most in both the single-site experiments and the two-site experiment, followed by VJ_slope and $b_{scalar}$ . This is because COS plant fluxes are much larger than COS fluxes of soil in general (Whelan et al., 2016, 2018; Spielmann et al., 2019a; Kooijmans et al., 2021; Ma et al., 2021; Maignan et al., 2021; Remaud et al., 2022) and because the soil-hydrology-related parameters cannot directly influence the COS plant uptake. Therefore, the assimilation of the COS flux mainly changed the parameters related to COS plant uptake rather than texture-dependent parameters that relate to soil COS flux to minimize the cost function. However, the adjustment of soil-hydrology-related parameters should not be neglected as well, as they play an important role in minimizing the discrepancy between COS simulations and observations.

As shown in Fig. 3, the prior simulations underestimated COS fluxes at nighttime for many sites, e.g., FI-Hyy. On the one hand, this is due to the substantial gap between current modeled COS soil fluxes and observations (Whelan et al., 2022). On the other hand, this also stems from the fact that the nighttime stomatal conductance was set to a low and constant value (1 mmol m $^{- 2}$ s $^{- 1}$ ) in the BEPS model. As a result, the discrepancy between nighttime ecosystem COS simulations and observations could not be reduced by adjusting photosynthesis-related parameters to have an effect on stomatal conductance modeling. Thus, soil-hydrology-related parameters were adjusted to compensate for the differences in both soil and plant components simultaneously. In this study, the COS soil model proposed by Whelan et al. (2016, 2022) was utilized, in which the optimal SWC for soil COS biotic uptake was set to 12.5 % for grass. Such an optimal SWC value is much lower than the prior simulated SWC, as shown in Fig. S7a and c. Therefore, the soil-hydrology-related parameters were considerably tuned at AT-Neu and ES-Lma, resulting in a rapid decline in the posterior SWC simulation to a level comparable to the optimum SWC. COS plant uptake is governed by the hydrolysis reaction of COS (Wohlfahrt et al., 2012), catalyzed by CA, though it can also be degraded by other photosynthetic enzymes, e.g., Rubisco (Lorimer and Pierce, 1989; Ma et al., 2021), and the reaction is not dependent on light (Stimler et al., 2011; Whelan et al., 2018). However, given that stomatal conductance is simulated from net photosynthetic rate with a modified version (Woodward et al., 1995; Ju et al., 2010) of the BB model (Ball et al., 1987) in BEPS, the adjustment of light-reaction-related parameters (VJ_slope and f_leaf) can therefore indirectly affect the simulation of COS plant uptake by influencing the calculation of stomatal conductance. According to Ryu et al. (2018), f_leaf varies little in reality and is usually between 41 % and 53 % on an annual mean scale. In our assimilation experiments, the optimized f_leaf values were distributed between 42.92 % and 51.28 %, consistent with this study. In contrast, the other light-reaction-related parameter, VJ_slope, has a much wider range of variation, with relative changes ranging from $- 65.70$ % to 35.18 %.

We noticed remarkably different optimization results for photosynthesis-related parameters in the experiments conducted in July 2013 and July 2014 at FI-Hyy, especially for $V_{cmax 25}$ and VJ_slope. In these two experiments, the difference in the relative change in both $V_{cmax 25}$ and VJ_slope is more than 39 %. However, these different adjustments to the parameter set caused a similar impact on COS simulations, leading to the latter being reduced by 13.38 % and 24.22 % in July 2013 and July 2014, respectively. These results revealed the “equifinality” (Beven, 1993) of the inversion problem at hand, i.e., the fact that different combinations of parameter values can achieve a similar fit to the COS observations. Assimilation of further observational data streams is expected to reduce the level of equifinality by differentiating between such combinations of parameter values that achieve a similar fit to COS observations.

4.2 Parameter sensitivity

It has been proven that photosynthetic capacity simulated by terrestrial ecosystem models is highly sensitive to $V_{cmax}$ , $J_{\max}$ , and light conditions (Zaehle et al., 2005; Bonan et al., 2011; Rogers, 2014; Sargsyan et al., 2014; Koffi et al., 2015; Rogers et al., 2017). Therefore, it is expected that $V_{cmax 25}$ , VJ_slope, and f_leaf would markedly affect the optimization results, as these parameters ultimately have an impact on the simulation of plant COS uptake by influencing the estimation of photosynthesis capacity and stomatal conductance. Specifically, the results of Wang et al. (2004), Verbeeck et al. (2006), Staudt et al. (2010), Han et al. (2020), and Ma et al. (2022) showed that the simulated photosynthetic capacity was generally more sensitive to $J_{\max}$ and light conditions than to $V_{cmax}$ . However, due to the differences in the physiological mechanisms of COS plant uptake and photosynthesis, e.g., the hydrolysis reaction of COS by CA is not dependent on light, the sensitivities of the two processes with respect to the model parameters may differ considerably although they are tightly coupled. Indeed, our adjoint sensitivity results suggest that the same change in $V_{cmax 25}$ is capable of influencing the assimilation results to a greater extent than a change in VJ_slope and f_leaf would. This result can be attributed to the model structure of $V_{cmax 25}$ that not only affects the estimation of stomatal conductance through photosynthesis but is also used to characterize mesophyll conductance and CA activity due to their linear relationships with $V_{cmax}$ (Badger and Price, 1994; Evans et al., 1994; Berry et al., 2013). In addition, such a large sensitivity to $V_{cmax 25}$ also indicates the importance of accurate modeling of the apparent conductance of COS for ecosystem COS flux simulation.

As for Ksat $_{scalar}$ and $b_{scalar}$ , they also play an important role in the assimilation of COS, since the SWC simulations of BEPS are sensitive to Ksat and b (Liu et al., 2011) and SWC is the primary factor for COS soil biotic flux modeling (Whelan et al., 2016). However, as the soil COS exchange is generally much smaller than COS plant uptake (Whelan et al., 2018) and the parameter scheme provided by Whelan et al. (2022) sets different empirical parameter values (see Table S3 for details) depending on the PFTs, the SIs of Ksat $_{scalar}$ and $b_{scalar}$ differ considerably across PFTs and are overall lower than those of photosynthesis-related parameters.

In Sect. 3.5, we mentioned that the radiation-related parameter f_leaf tends to play a more essential role in the assimilation of COS at the forest sites. Similar findings by Sun et al. (2019) found that the simulated GPP was more sensitive to radiation at forested vegetation types and less sensitive at low-stature vegetation types. In particular, the simulated GPP was also found to be highly sensitive to variations in radiation at low-radiation conditions (Koffi et al., 2015).

4.3 Impacts of COS assimilation on ecosystem carbon, energy, and water cycles

Due to the physiological basis that COS is taken up by plants through the same pathway of stomatal diffusion as CO $_{2}$ , the assimilation of COS was expected to optimize the simulation of GPP. It was confirmed by our single-site and two-site experiments conducted in a variety of ecosystems, including evergreen needleleaf forest, deciduous broadleaf forest, C $_{3}$ grass, and C $_{3}$ crop. However, as it is limited by many factors, such as the observation errors of the COS fluxes, the assimilation of COS does not always improve the simulation of GPP, e.g., at AT-Neu site.

Similarly to photosynthesis, transpiration is also coupled with the COS plant uptake through stomatal conductance. The difference is that, after CO $_{2}$ is transported to the chloroplast surface, it continues its journey inside the chloroplast and is eventually assimilated in the Calvin cycle (Wohlfahrt et al., 2012; Kohonen et al., 2022a). Based on the BB model, photosynthesis-related parameters only indirectly influence the calculation of stomatal conductance through photosynthesis in our model. Thus, ET was not optimized as dramatically as GPP in the assimilation of COS. In comparison, the RMSEs of GPP simulations were reduced by an average of 23.54 % as a result of the assimilation of COS, but ET was reduced by only 16.96 %. Moreover, as the transpiration rate and leaf temperature change show a linear relationship (Kümmerlen et al., 1999; Prytz et al., 2003) and surface–air temperature difference is a key control factor for sensible heat fluxes (Campbell and Norman, 2000; Arya, 2001; Jiang et al., 2022), the optimization for transpiration can therefore improve the simulation of leaf temperature and consequently improve the simulation of sensible heat flux.

Driven by the difference in water potential between the atmosphere and the substomatal cavity (Manzoni et al., 2013), the water is taken up by the roots, flows through the xylem, and exits through the leaf stomata to the atmosphere in the soil–plant–atmosphere continuum via evapotranspiration (Daly et al., 2004). Thus, when plants transpire, the water potential next to the roots decreases, driving water from bulk soil towards roots (Carminati et al., 2010) and reducing soil moisture. Certainly, soil moisture dynamics are also influenced by soil evaporation and leakage during inter-storm periods under ideal conditions (Daly et al., 2004). However, studies have shown that transpiration represents 80 % to 90 % of terrestrial evapotranspiration (Jasechko et al., 2013) and that evaporation is typically a small fraction of transpiration for well-vegetated ecosystems (Scholes and Walker, 1993; Daly et al., 2004). Based on current knowledge of leakage, for example, the relationship between leakage and the behavior of hydraulic conductivity (Clapp and Hornberger, 1978), extremely small adjustments of Ksat and b (i.e., with relative changes of 0.0057 % for Ksat $_{scalar}$ and $- 0.057$ % for $b_{scalar}$ in July 2015 at FI-Hyy) hardly caused any change in leakage. Therefore, our results indicate that the assimilation of COS can not only markedly improve the modeling of stomatal conductance and transpiration, but it can also ultimately impact SWC predictions. However, our results also show that there are obvious discrepancies between the ecosystem COS flux simulations and observations and that discrepancies cannot be effectively reduced by the adjustment of the photosynthesis-related parameters due to the simplification of BEPS for nighttime stomatal conductance modeling. As a result, it was also observed that the soil-hydrology-related parameters were drastically adjusted to minimize the discrepancy of COS simulations and observations, which instead biased the SWC simulations away from observations, for example, as shown in Figs. S7a and c.

4.4 Impacts of leaf area index data on parameter optimization

As essential input data of the BEPS model, LAI products have been demonstrated to be a source of uncertainty in the simulation of carbon and water fluxes (Liu et al., 2018). Therefore, it is necessary to investigate the influence of LAI on our parameter optimization results, as the LAI is directly related to the simulation of COS, and the discrepancy between COS simulations and COS observations is an essential part of the cost function. Here we collected three widely used satellite-derived LAI products (GLOBMAP, GLASS, and MODIS) and the means of the in situ LAI during the growing seasons or during the COS measurement periods for these sites (see Table 1). These in situ LAI means were used to drive the BEPS model along with the other three satellite-derived LAI products, with the assumption that they are representative of the LAI values during the assimilation periods. The configurations of these assimilation experiments were the same as those listed in Table 2 so that a total of 52 single-site experiments were conducted. All experiments were successfully performed, and the results are shown in Figs. 6 and S8.

Figure 6

Influence of LAI on the posterior $V_{cmax 25}$ obtained by the single-site experiments conducted at seven sites and driven by four LAI datasets (GLOBMAP, GLASS, MODIS, and in situ). The posterior $V_{cmax 25}$ and the LAI were represented by their normalized values, $N_{V_{cmax 25}}$ and $N_{LAI}$ , respectively. The posterior parameters were normalized by their prior values, and the LAI was normalized by the in situ values. The best-fit linear regression line of the posterior parameters obtained based on the satellite-derived LAI (GLOBMAP, GLASS, and MODIS) with the corresponding LAI data is shown, with the 95 % confidence interval spread around the line.

[Figure omitted. See PDF]

We found that the posterior $V_{cmax 25}$ significantly correlated with the LAI ( $R^{2} = 0.22$ , $P < 0.01$ ), while there was no apparent relationship between the optimization results of the other four parameters and the LAI. As mentioned before, the LAI is directly related to the simulation of COS and thus influences the posterior values of the parameters. Therefore, the correlations of LAI with these parameters reflect the robustness of the constraint abilities of COS assimilation with respect to them. These results suggest that the assimilation of COS is able to provide strong constraints on $V_{cmax 25}$ , while it constrains other parameters (VJ_slope, Ksat $_{scalar}$ , $b_{scalar}$ , and f_leaf) weakly, although they are also considerably changed by the assimilation. In conclusion, our results suggest that the uncertainty in satellite-derived LAI can not only exert large impacts on the modeling of water–carbon fluxes (Wang et al., 2021), but it is an important source of uncertainty in the parameter optimization results when performing data assimilation experiments with ecosystem models driven by LAI.

4.5 Caveats and implications

In general, we found that the assimilation of ecosystem COS fluxes can improve the model performance for GPP, ET, and $H$ for both the single-site assimilation and the two-site assimilation. Nonetheless, there are currently limitations that affect the use of COS data for the optimization of parameters, processes, and variables related to water–carbon cycling and energy exchange in terrestrial ecosystem models.

The assimilation of COS fluxes relies on the availability and quality of field observations. As both COS plant uptake and COS soil exchange are modeled within NUCAS and the data assimilation was performed at the ecosystem scale, a large number of accurate measurements of both COS soil flux and COS plant flux are essential for COS assimilation and model evaluation. However, at present, we face a serious lack of COS measurements (Brühl et al., 2012; Wohlfahrt et al., 2012). More laboratory and field measurements are needed for better understanding of the mechanistic processes of COS. Besides, the existing COS fluxes were calculated based on different measurement methods and data processing steps, which poses considerable challenges for comparing COS flux measurements across sites. In particular, as only raw COS concentrations were provided and a correction approach was employed, the estimated COS fluxes at US-Wrc may be subject to considerable uncertainties. A standardization of the measurement and processing techniques of COS is therefore urgently needed (Kohonen et al., 2020).

In this study, the prior uncertainty of observation was estimated by the standard deviation of ecosystem COS fluxes within 24 h with the assumption of a normal distribution. However, Hollinger and Richardson (2005) suggested that flux measurement error more closely follows a double exponential than a normal distribution. Kohonen et al. (2020) showed that the overall uncertainty in the COS flux varies with the sign (uptake or release) and the magnitude of the COS flux. Furthermore, there is a lack of understanding of the prior uncertainty for certain model parameters, such as VJ_slope, which makes the uncertainty estimates subject to potentially large errors. In conclusion, we should be more careful in considering the distribution and the magnitude of the prior uncertainty of observations and parameters.

The spatial and temporal variation in atmospheric COS concentrations has a considerable influence on the COS plant uptake (Ma et al., 2021) due to the linear relationship between the two (Stimler et al., 2010). The typical seasonal amplitude of atmospheric COS concentrations is $\sim$ 100–200 parts per trillion (ppt) around an average of $\sim$ 500 ppt (Montzka et al., 2007; Kooijmans et al., 2021; Hu et al., 2021; Ma et al., 2021; Belviso et al., 2022). However, in NUCAS, COS mole fractions in the bulk air are currently assumed to be spatially invariant over the globe and to vary annually, which may introduce substantial errors into the parameter calibration. Kooijmans et al. (2021) confirmed that modifying the COS mole fractions to vary spatially and temporally markedly improved the simulation of ecosystem COS flux. Thus, we suggest taking into account the variations in COS concentration and their interaction with surface COS fluxes at high spatial and temporal resolution in order to achieve better parameter calibration.

Currently, there are still uncertainties in the simulation of COS fluxes by BEPS, particularly for nighttime COS fluxes. As the nighttime COS plant uptake is driven by stomatal conductance (Kooijmans et al., 2021), nighttime COS fluxes can therefore be used to test the accuracy of the model settings for nighttime stomatal conductance ( $g_{n}$ ). In the BEPS model, a low and constant value (1 mmol m $^{- 2}$ s $^{- 1}$ ) of $g_{n}$ was set for all PFTs. Our simulations of nighttime COS flux indicate that, in BEPS, $g_{n}$ is underestimated to different degrees for different sites. Similar findings by Resco De Dios et al. (2019) showed that the median $g_{n}$ in the global dataset was 40 mmol m $^{- 2}$ s $^{- 1}$ . Therefore, the utilization of COS to directly optimize stomatal-related parameters should be investigated. Cho et al. (2023) proved the effectiveness of optimizing the minimum stomatal conductance and other parameters by the assimilation of COS. As different enzymes have different physiological characteristics, Cho et al. (2023) proposed a new temperature function for the CA enzyme and showcased the considerable difference in temperature response of enzymatic activities of CA and Rubisco, which provided valuable insights into the modeling and assimilation of COS. In addition, soil COS exchange is an important source of uncertainty in the use of COS as a carbon–water cycle tracer, since CA activity occurs in the soil as well (Kesselmeier et al., 1999; Smith et al., 1999; Ogée et al., 2016; Meredith et al., 2019). Kaisermann et al. (2018) showed that COS hydrolysis rates were linked to microbial C biomass, whilst COS production rates were linked to soil nitrogen content and mean annual precipitation (MAP). Interestingly, MAP was also suggested to be the best predictor of $g_{n}$ by Yu et al. (2019), who found that plants in locations with lower rainfall conditions had higher $g_{n}$ . Therefore, by using the global microbial C biomass, soil nitrogen content, and MAP datasets; the relationships between these variables; and the associated COS exchange processes, it is to be expected that a more accurate modeling of terrestrial ecosystem COS fluxes could be achieved, further increasing our understanding of the global COS budget and facilitating the assimilation of COS fluxes.

5 Conclusions

Over the past decades, considerable efforts have been made to obtain field observations of COS ecosystem fluxes and to describe empirically or mechanistically COS plant uptake and soil exchange, which offer the possibility of investigating the ability of assimilating ecosystem COS flux to optimize parameters and variables related to the water and carbon cycles and to energy exchange. In this study, we introduced NUCAS, a system which was developed based on the BEPS model and was designed to have the ability to assimilate ecosystem COS fluxes. In NUCAS, a resistance analog model of COS plant uptake and an empirical model of soil COS flux were embedded in the BEPS model to achieve the simulation of ecosystem COS flux, and a gradient-based 4D-Var data assimilation algorithm was implemented to optimize the internal parameters of BEPS.

A total of 14 twin experiments, 13 single-site experiments, and 1 two-site experiment covering the period from 2012 to 2017 were conducted to investigate the capability of NUCAS to assimilate COS fluxes and to optimize input parameters and simulated variables. COS flux observations from a range of ecosystems were used, including four PFTs and three soil textures. Our results show that NUCAS has the ability to optimize parameter vectors and that the assimilation of COS can constrain parameters affecting the simulation of carbon and water cycles and of energy exchange and thus effectively improve the performance of the BEPS model. We found that there is a tight link between the assimilation of COS fluxes and the optimization of ET which demonstrates the role of COS as an indicator of stomatal conductance and transpiration. The improvement of ET can further improve the model performance for $H$ , although the propagation of the optimization effect is subject to some limitations. These results highlight the broad perspective of COS as a tracer for improving the simulation of variables related to stomatal conductance. Furthermore, we demonstrated that COS can provide a strong constraint on $V_{cmax 25}$ , whereas the adjustment of parameters related to the soil hydrology appears to compensate for weaknesses in the model, i.e., the nighttime stomatal conductance set in the BEPS model. We also proved the strong impact of LAI on the parameter optimization results, emphasizing the importance of developing more accurate LAI products for models driven by observed LAI. In addition, we made a number of recommendations for future improvement of the assimilation of COS. In particular, we flagged the need for more observations of COS and suggested better characterization of observational and prior parameter uncertainties, the use of varying COS concentrations, and the refinement of the model for COS fluxes of soil. Specifically, with the lack of separate COS plant and soil flux data, the ecosystem-scale COS flux observations were utilized in this study. However, we believe that assimilating the component fluxes of COS individually should be pursued in the future, as this assimilation approach would provide separate constraints on different parts of the model. We expect the observational information on the partitioning between the two flux components to provide a stronger constraint than using just their sum.

Our two-site setup constitutes a challenge for the assimilation system, the model, and the observations. In this setup, the assimilation system has to determine a parameter set that achieves a fit to the observations at both sites, and NUCAS passes this important test. It should be noted that NUCAS was designed as a platform that integrates multiple data streams to provide a consistent map of the terrestrial carbon cycle, although only ecosystem COS fluxes were used to evaluate the performance of NUCAS in this study. The two-site assimilation experiment conducted in this study gives us more confidence that the calibrated model will provide a reasonable parameter set and posterior simulation throughout the plant functional type. In other words, what we present here is a pre-requisite for applying the model and assimilation system at regional to global scales.

We noticed the optimization of model parameters faced the challenge of equifinality due to the complexity of the model and the limited observation data. However, the equifinality can be avoided by imposing additional observational constraints (Beven, 2006). Indeed, using several different data streams to simultaneously (Kaminski et al., 2012; Schürmann et al., 2016; Scholze et al., 2016; Wu et al., 2018; Scholze et al., 2019) or step-wise (Peylin et al., 2016) constrain multiple processes in the carbon cycle is becoming a focus area in carbon cycle research. Therefore, it is necessary to combine COS with other observations to constrain different ecosystem processes and/or exploit multiple constraints on the same processes in order to achieve better modeling and prediction of the ecosystem water–carbon cycle and energy exchange.

Appendix A Stomatal conductance and soil hydrology modeling in BEPS, including parameters to be optimized

In the BEPS model, the leaf stomatal conductance to water vapor ( $g_{sw}$ in mol m $^{- 2}$ s $^{- 1}$ ) is estimated using a modified version of the BB empirical model (Ball et al., 1987) following Woodward et al. (1995): A1 $g_{sw} = b_{H_{2} O} + \frac{m_{H_{2} O} A R_{h} f_{w}}{C_{a}},$ where $b_{H_{2} O}$ is the intercept of the BB model, representing the minimum $g_{sw}$ (mol m $^{- 2}$ s $^{- 1}$ ); $m_{H_{2} O}$ is the empirical slope parameter in the BB model (unitless); $R_{h}$ is the relative humidity at the leaf surface (unitless); $f_{w}$ is a soil moisture stress factor describing the sensitivity of $g_{sw}$ to soil water availability (Ju et al., 2006); and $C_{a}$ is the atmospheric CO $_{2}$ concentration ( $µ$ mol mol $^{- 1}$ ). The net photosynthesis rate ( $A$ ) is calculated using the Farquhar model (Farquhar et al., 1980; Chen et al., 1999): $\begin{matrix} A2 & A = min (A_{i}, A_{j}) - R_{d} \\ A3 & A_{c} = V_{cmax} \frac{C_{i} - Γ_{i}^{*}}{C_{i} + K_{c} (1 + \frac{O_{i}}{K_{o}})} \\ A4 & A_{j} = J \frac{C_{i} - Γ_{i}^{*}}{4 (C_{i} - 2 Γ_{i}^{*})}, \end{matrix}$ where $A_{i}$ and $A_{j}$ are Rubisco-limited and RuBP-limited gross photosynthetic rates ( $µ$ mol m $^{- 2}$ s $^{- 1}$ ), respectively. $R_{d}$ is leaf dark respiration ( $µ$ mol m $^{- 2}$ s $^{- 1}$ ). $V_{cmax}$ is the maximum carboxylation rate of Rubisco ( $µ$ mol m $^{- 2}$ s $^{- 1}$ ); $J$ is the electron transport rate ( $µ$ mol m $^{- 2}$ s $^{- 1}$ ); Ci and Oi are the intercellular carbon dioxide (CO $_{2}$ ) and oxygen (O $_{2}$ ) concentrations (mol mol $^{- 1}$ ), respectively; and Kc and Ko are Michaelis–Menten constants for CO $_{2}$ and O $_{2}$ (mol mol $^{- 1}$ ), respectively.

The electron transport rate, J, is dependent on incident photosynthetic photon flux density (PPFD; $µ$ mol m $^{- 2}$ s $^{- 1}$ ) as A5 $J = \frac{J_{\max} I}{I + 2.1 J_{\max}},$ where $J_{\max}$ is the maximum electron transport rate ( $µ$ mol m $^{- 2}$ s $^{- 1}$ ) and $I$ is the incident PPFD calculated from the incident shortwave radiation $R_{SW}$ (W m $^{- 2}$ ): A6 $I = β R_{SW} f_leaf,$ where $β = 4.55$ is the energy–quanta conversion factor ( $µ$ mol J $^{- 1}$ ) and f_leaf is the ratio of photosynthetically active radiation to the shortwave radiation (unitless).

The maximum carboxylation rate of Rubisco $V_{cmax}$ was calculated according to the Arrhenius temperature function and the maximum carboxylation rate of Rubisco at 25 °C ( $V_{cmax 25}$ ). $V_{cmax}$ is generally proportional to leaf nitrogen content. Considering that the fractions of both sunlit and shaded leaf areas to the total leaf area and the leaf nitrogen content vary with the depth into the canopy ( $L$ ), the $V_{cmax}$ values of sunlit ( $V_{cmax, sun}$ ) and shaded ( $V_{cmax, sh}$ ) leaves can be obtained through vertical integrations with respect to $L$ (Chen et al., 2012): $\begin{matrix} A7 & V_{cmax, sunlit} = V_{cmax} χ_{n} N_{leaf} \frac{k [1 - e^{(k_{n} + k) L}]}{(k_{n} + k) (1 - e^{- k L})} \\ A8 & \begin{aligned} V_{cmax, shaded} & = V_{cmax} χ_{n} N_{leaf} \\ \times \frac{\frac{1}{k_{n}} [1 - e^{- K_{n} L}] - \frac{1}{k_{n} + k} [1 - e^{(k_{n} + k) L}]}{L - \frac{1}{k} (1 - e^{- k L})}, \end{aligned} \end{matrix}$ where $χ_{n}$ (m $^{2}$ g $^{- 1}$ ) is the relative change in $V_{cmax}$ to leaf nitrogen content; $N_{leaf}$ (g m $^{- 2}$ ) is the leaf nitrogen content at the top of the canopy; and $k_{n}$ (unitless) is the leaf nitrogen content decay rate with increasing depth into the canopy, taken as 0.3. $k$ is calculated as A9 $k = G (θ) Ω cos⁡ (θ),$ where $G (θ)$ is the projection coefficient, taken as 0.5; $Ω$ is the clumping index; and $θ$ is the solar zenith angle.

After $V_{cmax}$ values for the representative sunlit and shaded leaves are obtained, the maximum electronic transport rate for the sunlit and shaded leaves is obtained from Medlyn et al. (1999): A10 $J_{\max} = VJ_slope V_{cmax} - 14.2 .$ The soil water availability factor $f_{w, i}$ in each layer $i$ is calculated as A11 $f_{w, i} = \frac{1.0}{f_{i} (ψ_{i}) f_{i} (T_{s, i})},$ where $f_{i} (ψ_{i})$ is a function of matrix suction $ψ_{i}$ (m) (Zierl, 2001) and $f_{i} (T_{s, i})$ is a function describing the effect of soil temperature ( $T_{s, i}$ in °C) on soil water uptake (Bonan, 1991).

To consider the variable soil water potential at different depths, the scheme of Ju et al. (2006) was employed to calculate the weight of each layer ( $w_{i}$ ) to $f_{w}$ : A12 $w_{i} = \frac{R_{i} f_{w, i}}{\sum_{i = 1}^{n} R_{i} f_{w, i}},$ where $n$ is the soil layer number (five were used in this study) of the BEPS model and $R_{i}$ is the root fraction in layer $i$ , calculated as A13 $R_{i} = \{\begin{cases} 1 - r_{decay}^{100 {cd}_{i}} & i = 1 \\ r_{decay}^{100 {cd}_{i - 1}} - r_{decay}^{100 {cd}_{i}} & 1 < i < n \\ r_{decay}^{100 {cd}_{i - 1}} & i = n \end{cases},$ where ${cd}_{i}$ is the cumulative depth (m) of layer $i$ . In this study, each soil layer depth (from top to bottom) of the BEPS model is 0.05, 0.10, 0.20, 0.40, and 1.25 m, respectively.

The overall soil water availability $f_{w}$ is then calculated as A14 $f_{w} = \sum_{i = 1}^{n} f_{w, i} w_{i} .$ The hydraulic conductivity of each soil layer $K_{i}$ (m s $^{- 1}$ ) is expressed as A15 $K_{i} = {Ksat}_{i} {(\frac{{SWC}_{i}}{θ_{s, i}})}^{2 b_{i} + 3},$ where Ksat $_{i}$ is the saturated hydrological conductivity of soil layer $i$ (m s $^{- 1}$ ); SWC $_{i}$ is the volumetric liquid soil water content of soil layer $i$ (m s $^{- 1}$ ); $θ_{s, i}$ is the porosity of soil layer $i$ (unitless); and $b_{i}$ is the Campbell parameter for soil layer $i$ , determining the change rate of hydraulic conductivity with SWC (unitless). In this study, Ksat $_{i}$ and $b_{i}$ are expressed as $\begin{matrix} A16 & {Ksat}_{i} = {Ksat}_{scalar} {Ksat}_{df, i} \\ A17 & b_{i} = b_{scalar} b_{df, i}, \end{matrix}$ where Ksat $_{df, i}$ and $b_{df, i}$ are the default values of Ksat $_{i}$ and $b_{i}$ , respectively.

Code availability

The source code for BEPS is publicly available at https://doi.org/10.5281/zenodo.10667005 (Wu, 2023), and the adjoint code for BEPS is available upon request from the correspondence author ([email protected]).

Data availability

Measured eddy-covariance carbonyl sulfide flux data can be found at https://doi.org/10.5281/zenodo.3993111 (Spielmann et al., 2020b) for AT-Neu; https://doi.org/10.5281/zenodo.3406990 (Spielmann et al., 2019b) for DK-Sor, ES-Lma, and IT-Soy; https://doi.org/10.5281/zenodo.6940750 (Kohonen et al., 2022b) for FI-Hyy; and from the Harvard Forest Data Archive under record HF214 (https://doi.org/10.6073/pasta/7ed7b4d1fc7ad303998e76143a3b279a; Commane et al., 2016) for US-Ha1.The raw COS concentration data of US-Wrc can be obtained at https://doi.org/10.5281/zenodo.1422820 (Rastogi et al., 2018b). The meteorological data can be obtained from the FLUXNET database (https://fluxnet.org/, last access: June 2022) for AT-Neu (10.18140/FLX/1440121, Georg et al., 2020; Pastorello et al., 2020), DK-Sor (10.18140/FLX/1440155, Andreas and Kim, 2020), ES-Lma (10.18160/FDSD-GVRS, Migliavacca et al., 2020), FI-Hyy (10.18140/FLX/1440158, Ivan et al., 2020), and US-Ha1 (10.18140/FLX/1440071, Harvard University, 2020); from the AmeriFlux database (https://ameriflux.lbl.gov/, last access: June 2022) for US-Ha1 (10.17190/AMF/1871137, Munger, 2022) and US-Wrc (10.17190/AMF/1246114, Wharton, 2016); and from the ERA5 dataset (https://doi.org/10.24381/cds.adbb2d47; Hersbach et al., 2023) for AT-Neu, IT-Soy, and US-Ha1. The evaluation data can be obtained from the FLUXNET database for DK-Sor, ES-Lma, FI-Hyy, and US-Ha1; from the AmeriFlux database for US-Ha1 and US-Wrc; from https://doi.org/10.5281/zenodo.3993111 (Spielmann et al., 2020b) for AT-Neu; from https://smear.avaa.csc.fi/ (last access: May 2024) for FI-Hyy; from https://doi.org/10.5281/zenodo.3406990 (Spielmann et al., 2019b) for IT-Soy; and from https://doi.org/10.5281/zenodo.1422820 (Rastogi et al., 2018b) for US-Wrc. The $H$ and $LE$ data of AT-Neu and IT-Soy are provided by Felix M. Spielmann and Georg Wohlfahrt. The GLOBMAP LAI is available at https://doi.org/10.5281/zenodo.4700264 (Liu et al., 2021), the GLASS LAI is available at https://doi.org/10.12041/geodata.GLASS_LAI_MODIS(0.05D).ver 1.db (Xiao et al., 2016), and the MODIS LAI product is available at https://doi.org/10.5067/MODIS/MOD15A2H.006 (Myneni et al., 2015). All datasets used in this study and the model outputs are available upon request.

The supplement related to this article is available online at: https://doi.org/10.5194/gmd-17-6337-2024-supplement.

Author contributions

MW designed the experiments and developed the model. MV and TK developed the data assimilation layer including the adjoint code for the ecosystem model. HZ wrote the original paper and made the analysis. All authors contributed to the writing of the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

We thank everyone that contributed to the collection of data. Specifically, we acknowledge Felix M. Spielmann and Georg Wohlfahrt for providing $H$ and $LE$ data for AT-Neu and IT-Soy. Michael Vossbeck and Thomas Kaminski thank Laurent Hascoët for supporting this activity. The authors thank the reviewers for their highly valuable comments.

Financial support

This research has been supported by the National Key Research and Development Program of China (grant no. 2020YFA0607504), the National Natural Science Foundation of China (grant no. 42371486), and the Research Funds for the Frontiers Science Center for Critical Earth Material Cycling, Nanjing University (grant nos. 0209-14380115 and 0904-14380031).

Review statement

This paper was edited by Yilong Wang and reviewed by Kukka-Maaria Kohonen, Michael Cartwright, and one anonymous referee.

Word count: 13735

Show less

© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Modeling and predicting changes in the function and structure of the terrestrial biosphere and its feedbacks to climate change strongly depends on our ability to accurately represent interactions of the carbon and water cycles and energy exchange. However, carbon fluxes, hydrological status, and energy exchange simulated by process-based terrestrial ecosystem models are subject to significant uncertainties, largely due to the poorly calibrated parameters. In this work, an adjoint-based data assimilation system (Nanjing University Carbon Assimilation System; NUCAS v1.0) was developed, which is capable of assimilating multiple observations to optimize process parameters of a satellite-data-driven ecosystem model – the Biosphere–atmosphere Exchange Process Simulator (BEPS). Data assimilation experiments were conducted to investigate the robustness of NUCAS and to test the feasibility and applicability of assimilating carbonyl sulfide (COS) fluxes from seven sites to enhance our understanding of stomatal conductance and photosynthesis. Results showed that NUCAS is able to achieve a consistent fit to COS observations across various ecosystems, including evergreen needleleaf forest, deciduous broadleaf forest, C $_{3}$ grass, and C $_{3}$ crop. Comparing model simulations with validation datasets, we found that assimilating COS fluxes notably improves the model performance in gross primary productivity and evapotranspiration, with average root-mean-square error (RMSE) reductions of 23.54 % and 16.96 %, respectively. We also showed that NUCAS is capable of constraining parameters through assimilating observations from two sites simultaneously and achieving a good consistency with single-site assimilation. Our results demonstrate that COS can provide constraints on parameters relevant to water, energy, and carbon processes with the data assimilation system and opens new perspectives for better understanding of the ecosystem carbon, water, and energy exchanges.

Details

Title

Assimilation of carbonyl sulfide (COS) fluxes within the adjoint-based data assimilation system – Nanjing University Carbon Assimilation System (NUCAS v1.0)

Author

Zhu, Huajie¹

; Wu, Mousong¹; Jiang, Fei²

; Vossbeck, Michael³

; Kaminski, Thomas³; Xing, Xiuli⁴; Wang, Jun¹

; Ju, Weimin¹; Chen, Jing M⁵

¹ International Institute for Earth System Science, Nanjing University, Nanjing, 210023, China
² International Institute for Earth System Science, Nanjing University, Nanjing, 210023, China; Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology, School of Geography and Ocean Science, Nanjing University, Nanjing, 210023, China; Key Laboratory for Land Satellite Remote Sensing Applications of Ministry of Natural Resources, School of Geography and Ocean Science, Nanjing University, Nanjing, 210023, China; Frontiers Science Center for Critical Earth Material Cycling, Nanjing University, Nanjing, 210023, China
³ The Inversion Lab, Hamburg, Germany
⁴ Department of Environmental Science and Engineering, Fudan University, Shanghai, 200437, China
⁵ Department of Geography and Program in Planning, University of Toronto, ON, M5S 3G3, Canada

Pages

6337-6363

Publication year

2024

Publication date

2024

Publisher

Copernicus GmbH

ISSN

1991962X

e-ISSN

19919603

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/gmd-17-6337-2024

ProQuest document ID

3098449814

Assimilation of carbonyl sulfide (COS) fluxes within the adjoint-based data assimilation system – Nanjing University Carbon Assimilation System (NUCAS v1.0)

Jump to:

Full text

Abstract

Details

Suggested sources