Content area
The ocean is one of the largest sinks for anthropogenic carbon dioxide (Canth) and its removal of carbon dioxide (CO
1 Introduction
Humans are emitting PgC as carbon dioxide gas (CO2) to the atmosphere every year, and a portion of these emissions ( %) has entered the ocean (Friedlingstein et al., 2022). Ocean carbon accumulation mitigates global warming by slowing atmospheric CO2 accumulation. However – in a series of chemical processes known as ocean acidification – the elevated carbon content in seawater also shifts ocean carbonate chemistry toward a lower pH and carbonate ion content and toward a higher hydrogen ion (H+) content and CO2 partial pressure (CO2). These chemical shifts have varying and important impacts on marine organisms and potentially on entire ocean ecosystems (Doney et al., 2009, 2020). It is important to be able to distinguish between the ocean's large natural background dissolved inorganic carbon (DIC) content and the excess anthropogenic carbon (Canth) if we are to understand the extent, climate impact, and likely future outcomes of ocean Canth accumulation.
Ocean Canth is defined as the difference between the DIC in the modern ocean and the DIC that would be present if humans had never emitted CO2 (Sabine et al., 2004). It is not a measurable quantity as defined. Without a direct measure, Canth must be estimated, and there are numerous approaches to estimating Canth within the literature, including the following: global ocean biogeochemical model (GOBM) simulations (Khatiwala et al., 2013), data-assimilation-based ocean circulation models coupled with air–sea exchange parameterizations (Devries, 2014), approaches that rely on preformed property estimates and remineralization ratios (Vázquez-Rodríguez et al., 2009) or empirical relationships (Touratier and Goyet, 2004; Yool et al., 2010), comparisons of repeated hydrographic sections (Carter et al., 2019; Gruber et al., 2019b; Müller et al., 2023), techniques such as the transit time distribution (TTD) or Green function approaches that rely on transient tracers of air–sea exchange to infer histories of atmospheric contact and interior ocean circulation (Khatiwala et al., 2009; Waugh et al., 2006), and approaches that combine one or more of these other approaches (Sabine et al., 2004). Isotopic approaches address the related, although not identical, question of “how much of the DIC in seawater is of anthropogenic origin (e.g., Eide et al., 2017)?”. Research continues to improve upon these methodologies and to better quantify their uncertainties, often using reconstructions of exactly known model-simulated Canth distributions (Carter et al., 2019; Clement and Gruber, 2018; He et al., 2018; Matsumoto and Gruber, 2005; Waugh et al., 2006).
There are several qualities that are desirable for Canth estimation strategies. Foremost among these is accuracy, but it is also helpful for estimation approaches to be (1) accessible, (2) computationally efficient, and (3) able to return estimates for the past, present, or future. The importance of these latter three qualities is outlined in the following:
-
Accessibility. Implementation of most Canth estimation strategies requires nuanced understanding of the methodology so that decisions can be made about the parameters used in forward or inverse models or how and whether to account for various biogeochemical processes (e.g., calcification, organic matter ballasting, or iron dynamics and limitation). In addition, many Canth estimation strategies require the presence of co-located high-quality measurements of physical and biogeochemical properties (e.g., empirical multiple linear regression Canth change estimates) or transient tracer content measurements (e.g., TTD or Green-function-based estimates).
-
Computational efficiency. Some Canth estimation strategies require downloading and employing large sparse matrices (Davila et al., 2022), whereas others require iterative inverse model reconstructions or forward model simulations to be run with GOBMs (DeVries et al., 2017; Khatiwala et al., 2013).
-
Able to be estimated for the past, present, or future. Many Canth estimation techniques are limited to a narrow time window. For example, “extended multiple linear regression” approaches are usually limited to the period spanned by repeated shipboard hydrographic measurements (Carter et al., 2019; Gruber et al., 2019a; Müller et al., 2023). A related problem is the need to adjust a DIC dataset that was measured across years or decades to be specific to a single reference year or year of interest. To make this adjustment, it is important to know how much the DIC value would have changed due to Canth accumulation between when it was measured and the reference year. Simplistic adjustments invoking transient steady-state (Gammon et al., 1982) Canth accumulation are commonly employed (Carter et al., 2021a; Clement and Gruber, 2018; Lauvset et al., 2016; Müller et al., 2023), but they are problematic for larger adjustments that are often associated with longer time gaps. An example of an application that faces these challenges is given in Sect. S1 in the Supplement.
In this paper, we present three products. The first is the TRACEv1 code itself; the code is initially released only for the MATLAB computing language, although a Python port is planned. The code contains subroutines that use neural networks to remap the preformed property estimates of Carter et al. (2021b) to the locations and conditions provided by users calling the TRACEv1 routine. The second is an estimate of the likely uncertainties in TRACEv1 estimates based on an analysis of the errors found when the method is trained using transient tracer information extracted from a GOBM simulation – with a spatial and temporal distribution that mirrors the availability of CFC-11, CFC-12, and SF6 measurements in the real ocean – and is used to reconstruct the exactly known GOBM Canth distributions. The third is a data product of global Canth from TRACEv1 with varied 10- to 100-year resolution from 1750 through 2500. This product uses a variety of SSPs for projections after 2015.
2 Methods
First, we describe the conceptual framework for TRACEv1 and explain in detail how it works. Second, we introduce the observational datasets used to train TRACEv1 and explain how TTD parameters and preformed properties are empirically fit and estimated on demand. Finally, we explain how TRACEv1 is used to generate the TRACEv1_GGCanth product (Carter, 2025).
2.1 Conceptual framework and historical context
TRACEv1 emulates the inverse Gaussian (IG) TTD method for Canth estimation, although with several modifications. Traditionally, the TTD approach makes assumptions about the distribution of ages (length of time since seawater was last in contact with the atmospheric) of the various parcels of seawater that combine to produce the seawater observed in the ocean interior. Assumptions are also needed about the degree of air–sea equilibration with transient tracers. These assumptions are collectively used to tune the age distribution to match transient tracer observations, and similar assumptions are then used to infer the Canth content that would be expected for that mixture of seawater from the distribution of ages and the known history of atmospheric CO2 accumulation (e.g., He et al., 2018). TRACEv1 also follows these steps. The most important modification is that we reduce the TTD shape to a single term (), optimize this term to reflect transient tracer and modeled ideal age distributions as normal, and then train a neural network capable of predicting this term using only physical measurements of seawater and coordinate information. This allows us to estimate Canth from a TTD without the need for co-located transient tracer observations at the time and place where the estimate is desired.
When optimizing , CFC-11, CFC-12, and SF6 are dominant constraints for younger waters, while water mass ideal ages () (Thiele and Sarmiento, 1990) – taken from a model that assimilates transient tracer observations and measurements of the long-lived 14C radionuclide – are primarily included as a constraint for older water masses. SF6 measurements are particularly strong constraints for the youngest waters ventilated since the 1990s maxima in CFC-11 and CFC-12 concentrations, but they are only available for % of the measured bottles. All available constraints are used for optimizing all water parcels, and the strong constraint for young (old) waters and weak constraint for old (young) waters provided by transient tracers () is a natural result of how the values and misfits of each constraint vary with the age of the water mass. The transient tracer constraints therefore dominate in younger waters where the transient tracer measurements are largest, whereas the constraint dominates in water masses that are older than the advent of measurable atmospheric transient tracer concentrations in the period from 1940 to the 1960s. For water parcels older than , there is essentially no sensitivity to the transient tracer information. TRACEv1 is therefore more of an observation-based product in the surface ocean and an observation-tuned, model-based product in the deep ocean.
Several recent developments have enabled TRACEv1. First, the training data are taken from the recent 2023 update to the Global Data Analysis Project version 2 (GLODAPv2.2023) data product (Lauvset et al., 2024). This data product contains bottle measurements with both CFC-11 and CFC-12 and more measurements with CFCs and SF6 measurements (Fig. 1); SF6 was first included in the 2022 GLODAP release (Lauvset et al., 2022). CFC distributions have long been used to estimate Canth, and oceanographic SF6 measurements are available from many recent cruises owing to methodological developments by Tanhua et al. (2004) and advances allowing CFC and SF6 measurements on the same samples (Bullister et al., 2006) implemented by transient tracer teams globally (Erickson et al., 2023). Second, water mass ideal ages from the recently released transport matrix solutions of the Ocean Circulation Inverse Model of John et al. (2020) provide an additional constraint for TRACEv1. TRACEv1 uses a preformed property data product (Carter et al., 2021b) to estimate the composition of seawater when it was last exchanging CO2 with the atmosphere. Finally, the approach is assessed against newly simulated Canth, CFC, and SF6 distributions (Müller, 2023) that were generated as part of the second Regional Ocean Carbon Cycle Assessment and Processes effort (RECCAP2; e.g., DeVries et al., 2023). The simulated CFC and SF6 distributions (Schwinger, 2024) were not previously published as part of the RECCAP2 data product or used by the analyses.
Figure 1
Locations and years of measurements of CFC-11 and CFC-12 in the GLODAPv2.2023 data product (Lauvset et al., 2024). Dark borders around measurements indicate that SF6 is available alongside CFC-11 and CFC12.
[Figure omitted. See PDF]
2.2 How TRACEv1 worksWe begin with a summary of the TRACEv1 functions and then explain the various steps in greater detail. The TRACEv1 code comprises the following functions:
-
It uses a neural network to estimate an age distribution for seawater from a user-specified location, , and , and it returns the mean age if this is a desired output.
-
TRACEv1 uses a record or projection of the atmospheric CO2 in the years leading up to the date of the desired estimate to determine an anthropogenic CO2 level for each component of the water mass mixture.
-
The code convolves the age distribution with the component's CO2 history to estimate a component-fraction-weighted mean atmospheric CO2 for the water parcel.
-
It uses another set of neural networks to estimate the preformed properties of this water mass mixture from the user-specified location, , and .
-
TRACEv1 estimates the degree of CO2 disequilibrium expected for the surface ocean when responding to rapid changes in the atmospheric mole fraction of CO2, CO2.
-
The code solves for the Canth distribution as the difference between the DIC value that corresponds to the surface ocean equilibration level associated with the transient CO2 and the DIC value that corresponds to a “preindustrial” atmospheric CO2 of 280 ppm. TRACEv1 allows users to substitute arbitrary reference preindustrial CO2 values to obtain estimates that are comparable to literature estimates that have used alternative baselines, but all calculations provided herein are obtained using 280 ppm.
Preformed properties are estimates of the properties that interior ocean seawater mixtures had when they last were in contact with the atmosphere near the ocean surface. These are the properties that impacted air–sea gas exchange equilibrium processes when Canth was last able to change through contact with the atmosphere. In TRACEv1, preformed total titration seawater alkalinity content (TA0), preformed dissolved inorganic silicate content (Si0), and preformed dissolved inorganic phosphate content (P0) are collectively used with CO2 as constraints for the carbonate chemistry of seawater near the sea surface. These three quantities are estimated from three separate neural networks trained using latitude, longitude, depth, , and from the Lauvset et al. (2016) global gridded version of the GLODAPv2 data product as predictor information and the preformed property estimates of Carter et al. (2021b), estimated for the same gridded product, as target/validation data. Errors in preformed properties are small contributors to the overall Canth uncertainty (Sect. S3).
The fourth neural network estimates , which is used to construct the TTD. The TTD is an evolution of the “water mass age” concept (Bolin and Rodhe, 1973; Hall and Haine, 2002). While a water mass age is an estimate of the average length of time since a given parcel of interior ocean seawater was last at the ocean surface, a TTD comes from the recognition that interior ocean seawater is better represented as a mixture of many different water parcels – each with a different history of atmospheric contact and interior ocean circulation – than as a single parcel of water with a single . One-dimensional pipe flow with diffusion results in a distribution of ages that can be well approximated using an inverse Gaussian (IG) age fraction distribution (Peacock and Maltrud, 2006; Waugh et al., 2003) and provides good agreement with available transient tracer data (Sonnerup et al., 2013; Stanley et al., 2012; Waugh et al., 2004). However, there are places in the ocean where comparatively “young” (i.e., recently ventilated) waters mix with very old deep waters in appreciable amounts (e.g., Antarctic Intermediate Water, which is formed through the mixing of fresh surface waters near the polar front with upwelling upper Circumpolar Deep Water; see Naveira Garabato et al., 2009), and the one-dimensional pipe model age distribution is inadequate in these areas (Ito and Wang, 2017; Peacock and Maltrud, 2006). With this and similar concerns driving innovation, many variants on the underlying TTD shape have been used. However, our limited experimentation with these variants did not reveal any meaningful improvement over the simple IG distribution for reconstructing modeled Canth (Hall et al., 2002; Waugh et al., 2003, 2006); thus, we retain the simple IG formulation. Given the limited number of options tested, it is plausible that alternative age distributions could outperform the distribution fitting terms that we employ for TRACEv1. This is particularly likely for estimates, as erroneous TTD shapes have been shown to be less problematic for Canth than for due to the similarities between the atmospheric growth curves for transient tracers and Canth (Waugh et al., 2006).
The traditional form of the inverse Gaussian for an arbitrary coordinate variable “ is as follows:
1 where is the mean and is the shape parameter. However, in TTD literature, it is more common to specify this equation as follows: 2 where is equivalent to and the new shape parameter is related to by 3 Some consideration has been given in the literature to the ideal values for and for TTD analyses. Based on the results of He et al. (2018), we choose a and (or ), and we find in our model-based assessments that this assumption performs equivalently (within uncertainties) to the common alternative assumption of . The standard form of the IG probability distribution with a and (Eq. 2), or a 1 and a (Eq. 1), is evaluated from to (in increments of 0.01) using the “makedist” function in MATLAB.
The predicted parameter is used to convert a unitless IG distribution into an age distribution. This is used to identify the ages associated with this IG probability distribution, where the age values assigned to the 500 values equal years. The resulting age–probability distribution is then interpolated to integer ages for the most recent 1000 years. When is , it becomes impossible to interpolate across all 1000 years; however, in these cases, the missing values correspond to negligible fractional contributions and are neglected. The sum of these interpolated contributions usually diverges slightly from 1 due to the discretization of the continuous probability distribution and the inability to interpolate to all years, so the non-neglected component fractions are further divided by their sum to ensure that they add to unity. Thus, when is a large number, the mean of the Gaussian distribution is large (Fig. 2), whereas when is smaller, is smaller.
Figure 2
Three example ventilation-year distributions for a parcel of water observed in the year 2020. The “young,” “default,” and “old” mixtures in orange, blue, and yellow have mean ages of , 91, and 460 years, respectively. Fractions of a given color add up to 1 when summed across all years.
[Figure omitted. See PDF]
Figure 3
A comparison between surface ocean CO2 values in the model-based data product of Jiang et al. (2023) and (a) the modeled atmospheric CO2 value and (b) the value obtained from Eq. (5). Black lines are provided for reference, and the colored dots indicate projected and historical values from four different SSPs.
[Figure omitted. See PDF]
Once the age distribution is known, the atmospheric CO2 record is convoluted into the age distribution as outlined as follows (Hall et al., 2002): first, the atmospheric record or projection is interpolated to obtain values for the year of the desired estimate minus the ages in the distribution. Then, for each of the (up to) 1000 fractions of the water mass, the fraction-weighted mean ages () and concentrations () can be computed as fraction-weighted sums. For example, for gas with atmospheric concentration [] summed over the years prior to the estimate of interest, this would be computed as follows: 4 For the calculation, [] is replaced in this equation with years. The concentration values reflect complete air–sea equilibration, which is inconsistent with net ocean uptake of CO2 from air–sea gas exchange. For example, in the RECCAP2 model simulations, there is a atm increase in the surface ocean CO2 in 2018 relative to the preindustrial value compared to a 128.72 mol change in the atmospheric CO2 (DeVries et al., 2023; Müller, 2023). Also, the air–sea CO2 disequilibrium is thought to vary temporally (He et al., 2018) and be sensitive to the rate of atmospheric CO2 change. Therefore, we derive an empirical relationship between atmospheric CO2 and the median model–observation hybrid apparent surface ocean CO2 record given by Jiang et al. (2023). A variety of predictive relationships were tested, and the strongest predictive relationship (lowest root-mean-square error, RMSE) was obtained for the following: 5 Equation (5) suggests that the expected surface ocean CO2 value in an arbitrary year CO2,oce.year can be estimated as a function of the atmospheric CO2 in that year (CO2,atm.year_minus_65) and the difference between that atmospheric value and the value in the atmosphere 65 years prior (CO2,atm.year). Applying Eq. (5) to the CO2 record before use in TRACEv1 meaningfully reduces the mismatch between the simulated surface ocean CO2 and the atmospheric CO2 (Fig. 3). An additional constant offset of atm was found in the best-fit relationship (not shown on the right-hand side of Eq. 5), but this term likely reflects the water vapor correction between CO2 and CO2 and, potentially, parameterized net model degassing of riverine carbon. TRACEv1 neglects this constant offset because the code separately applies the water vapor correction for each parcel of seawater (Dickson et al., 2007) when converting between CO2 and CO2 and because including this term would have a nearly identical impact on preindustrial CO2.
Once water-fraction-weighted mean CO2 values are estimated for a parcel of seawater, the expected equilibrium DIC value for the water parcel when last at the ocean surface is calculated using estimated TA0, Si0, and P0. These calculations are repeated with both the and a user-provided preindustrial CO2 value (default is 280 ppm, adjusted for water vapor), and their difference is attributed to Canth. During fitting of the values (described later), a similar procedure is followed for transient tracer observations with CFC and SF6 equilibrium constants (see Warner and Weiss, 1985, and Bullister et al., 2002, respectively), although without adjustments for incomplete equilibration because the equilibrium timescales for these tracers are shorter than for CO2.
Carbonate chemistry calculations are computed with the CO2SYS code written for MATLAB (Van Heuven et al., 2011) and modified herein to increase the tolerance for pH changes during iteration from 0.0001 to 0.001 when converging on a pH value (to speed up the calculation). Carbonate dissociation constants from Lueker et al. (2000) are used with the total boron calculation from Uppström (1974) and the calculation from Perez and Fraga (1987). and values that are outside the viable range for these carbonate chemistry constants (–48 and –35 °C) are overridden with the nearest viable and values. This override has a minimal impact on most Canth calculations for common seawater types, but we caution here that TRACEv1 is not intended for use in freshwater or brackish environments. Information on computing optimization is provided in Sect. S4.
2.3 Data and model output used to train and run TRACEv1The parameter is fit to the CFC-11, CFC-12, and SF6 partial pressures that would be found in a gas phase in complete air–sea equilibrium with seawater with the measured composition, as well as to from the Ocean Circulation Inverse Model (OCIM) transport matrix (John et al., 2020) when the zero-age boundary layer is set equal to the shallowest layer in the OCIM model. For the real ocean, the transient tracer partial pressure values are taken as calculated from discrete seawater measurements in the GLODAPv2.2023 data product (Lauvset et al., 2024). Co-located measurements of salinity and temperature are also extracted from this data product. For the model reconstruction test, Canth, , and are taken from or computed from the NorESM RECCAP2 simulations (Müller, 2023). In addition to the standard RECCAP2 outputs, this model was also used to simulate CFC-11, CFC-12, and SF6 through the start of 2015 (Schwinger, 2024). The approaches used to obtain scattered values from gridded model output and to obtain scattered ages from the OCIM transport matrix are given in Sect. S5.
TRACEv1 allows more than nine options for atmospheric CO2 projections/histories and relies on a single reconstruction of transient tracers taken from a data product compiled by the United States Geological Survey (USGS) Reston Groundwater Dating Laboratory (see the “Data availability” section). The atmospheric CO2 reconstruction starting in the year 1 and continuing through the year 1000 is taken from the synthesis by Frank et al. (2010) for all CO2 options. Before the year 1, all reconstructions are set to a constant value of CO ppmv, equaling the atmospheric concentration in the year 1. From 1001 and through 1959, all reconstructions follow the historical concentrations of the SSPs as defined by Meinshausen et al. (2020), which are identical over this time range. From 1959 through 2022, the first option, which is called “historical/linear” and is the default option if no alternative is specified, uses the Mauna Loa measurements by Keeling et al. (1976) and Thoning et al. (1989), and if TRACEv1 is instructed to use this record to generate an estimate for a year that is after 2022, the slope from a linear trend fit to the last 10 years of the historical record is used to project to the year of the desired estimate. The remaining eight options are SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP3-7.0-lowNTCF, SSP4-3.4, SSP4-6.0, and SSP5-3.4-over, all as defined by Meinshausen et al. (2020). The SSPs diverge from each other starting in 2017. Between the years 1959 and 2017, the SSP values have a small average bias of ppmv compared to the historical Mauna Loa measurements with a root-mean-square disagreement of 0.8 ppmv. Additional custom concentration pathway options can be added by appending a new column of atmospheric CO2 concentrations to a plain text file (CO2TrajectoriesAdjusted.txt) that is read by TRACEv1 and by entering the number of the new option in the TRACEv1 code (i.e., if a 10th option is added, the CO2 pathway option for the “AtmCO2Trajectory” input would be 10). However, any user-provided concentration pathways should be adjusted by Eq. (5) before appending them to this file.
Figure 4
(a) The time history of atmospheric transient tracer and CO2 concentrations expressed as a percentage of their maximum deviation through 2020 from their assigned preindustrial values of 0 ppmv for CFC-11, CFC-12, and SF6 and 280 ppmv for CO2.(b) The nine atmospheric CO2 concentration pathway options used by TRACEv1, with all but “historical/linear” being SSPs as given by Meinshausen et al. (2020). Both versions of SSP3-7.0 fall nearly on top of each other on this plot and are assigned the same colors.
[Figure omitted. See PDF]
2.4 Fitting TRACEv1 parametersThe parameters are optimized using a bounded minimum “search” function (“fminsearchbnd” in MATLAB) with an initial value of , an upper bound of 1000, and a lower bound of . This function uses a Nelder–Mead simplex algorithm (Lagarias et al., 2006) with iterative variations in the term by 5 % to minimize a cost function. For each iteration of this solver, the (i.e., CFC-11, CFC-12, and SF6) transient tracer constraints and the are first calculated as described in Eq. (4). The cost function that is minimized for this solver () is the sum of the squared normalized errors of the three partial pressures and , or
6 Here, is the measured partial pressure of transient tracer extracted from discrete GLODAPv2.2023 product or (for the model validation experiments) from GOBM output, is the value calculated from and the record of atmospheric trace gas concentrations as described above, and is the atmospheric partial pressure of tracer in the year 2020. This third term is included to normalize the errors to a more comparable scale. Without this term, the CFC-12 (SF6) errors would be assigned higher (lower) weight than the errors in the other two transient tracers due to their greater (smaller) atmospheric partial pressures. Similarly, is the interpolated OCIM age, is the calculated , and is the ideal age of the oldest grid cell found in the OCIM age calculations (1354 years).
This process is repeated for the observational record and for the model output. The version of TRACEv1 that is trained on model output is referred to as TRACEv1_validation_NorESM, and details of this comparison are provided in Sect. S6. The version trained on real-world observations is referred to as TRACEv1. We generate 10 versions of TRACEv1 in which we retrain TRACEv1 after perturbing the transient tracer measurements from each cruise in GLODAPv2.2023 by a cruise-wide relative offset and each measurement by measurement-specific random perturbations. In Sect. S7, we quantify the likely impact of measurement uncertainties in the transient tracer measurements on the final Canth estimates via Monte Carlo analysis.
2.5 Canth data product creationWe use the gridded, temporally averaged GLODAPv2 data product (Lauvset et al., 2016) for , , latitude, longitude, and depth and vary only the year of the estimate to equal 1750, 1800, 1850, 1900, 1950, 1980, 1994.5, 2000, 2002, 2007.5, 2010, 2014.5, 2020, 2030, 2050, and 2100. Estimates are only made using the historical/linear and SSP1-1.9 reconstructions prior to 2010 (and we note that the SSPs are identical over this period). In 2020 and thereafter, estimates are provided for each of the nine CO2 concentration pathway options separately. The estimates in 1994.5, 2002, 2007, and 2014 are provided for comparison and interoperability with published literature distributions (Gruber et al., 2019a; Lauvset et al., 2016; Müller et al., 2023; Sabine et al., 2004).
We anticipate that the small differences between the 1750 and 1850 CO2 concentration estimates could prove useful for reconciling literature estimates of Canth that have been made to be specific to these two common choices of reference year. Our Canth definition is specific to the 280 ppmv atmospheric concentration rather than to a specific year. It is therefore possible for TRACEv1 to return very small negative Canth values, particularly for estimates following periods in which CO2 reached minima of ppmv in the 1st, 6th, and 18th centuries CE. The last time the atmospheric CO2 concentration was believed to equal 280 ppmv was 1790 (Frank et al., 2010), and TRACEv1 allows users to specify an alternative reference concentration.
3 Results and discussion
We discuss the uncertainty assessment and compare TRACEv1 reconstructions to alternatives, discuss the TRACEv1 projections through 2500, and highlight some areas where TRACEv1 is limited and might be improved. We compare TRACEv1 estimates to alternatives in Sect. S8.
3.1 Uncertainty estimation
In Sect. S5, we describe the results of our uncertainty assessments from model reconstruction (subscript MR) Canth distributions. In Sect. S6, we present the results of the Monte Carlo (subscript MC) analysis. Here, we combine the results of these analyses to estimate the uncertainty of TRACEv1 () estimates that results from several sources. The model Canth reconstruction estimates reveal methodological uncertainties, including the limitations of an IG TTD and the inaccuracies associated with using a neural network across a large geographical area, and the uncertainties that result from potential OCIM distribution and preformed property distribution errors. The Monte Carlo analysis reveals uncertainties that result from random uncertainties and cruise-wide offsets in transient tracer concentration measurements. We add these uncertainties in quadrature to obtain the overall uncertainty estimate () for TRACEv1 ():
7 Here, is the Monte Carlo RMSE estimate of 1 mol kg−1 for Canth and % for inventories, and is the uncertainty estimate from the model reconstruction of 4.4 mol kg−1 for Canth and 15 % for inventories, conservatively chosen because the model reconstruction reproduces inventories to within 10 % in 1980 and 2014. The uncertainty appears to grow with the estimate and over time, so 15 % of the estimated Canth is used when this value exceeds 4.4 mol kg−1. These uncertainty estimates neglect the contribution of uncertainty in the and values used in the neural network to the overall Canth estimate uncertainty, which we believe to be small relative to , but we note that users can conduct perturbation tests if they are supplying particularly uncertain and information. is an optional output from TRACEv1. In Sect. S6, we show that reconstruction errors are significantly larger in marginal seas with few or no transient tracer measurements and are also elevated near coasts and in areas of strong upwelling. should be considered an underestimate in these regions. We do not attempt to estimate uncertainty in the optional TRACEv1 output.
3.2 Canth reconstructions and data product comparisonsThe reconstructions and projections from TRACEv1 (Table 1) match past anthropogenic inventory estimates obtained from analyses based on measurements of DIC changes and distributions (Fig. 5) in 1994 (118() PgC from Sabine et al., 2004, vs. 127() for TRACEv1), 2007 ( PgC from Müller et al., 2023, updating Gruber et al., 2019a, vs. 161() for TRACEv1), and 2014 ( PgC from Müller et al., 2023, vs. 182() for TRACEv1). The agreement with the DIC-based approaches is reassuring, as there is little overlap in the data or methodologies used to generate the DIC-based estimates compared to the data and methods used to obtain the TRACEv1 routines: Müller et al. (2023) did not rely on transient tracer information, and the data used in this study are, on average, more recent than the CFC-11 and CFC-12 information used by Sabine et al. (2004) (Fig. 1).
Figure 5
Panel (a) shows the global Canth inventory projected by TRACEv1, with blue shading indicating the uncertainty estimate. Circles show estimates from literature-data-based Canth distribution estimates, with filled circles indicating estimates rooted primarily in DIC measurements and open circles indicating estimates rooted primarily in fitting transient tracer distributions. Panel (b) shows projected values through 2500, using solid lines for various SSPs (as labeled). Thin dotted lines indicate the inventories that would be obtained by projecting the 2020 estimate using transient steady-state assumptions (Gammon et al., 1982) with the atmospheric CO2 concentrations from the SSPs with the same line color. Both versions of SSP3-7.0 fall nearly on top of each other on this plot and are assigned the same colors.
[Figure omitted. See PDF]
Table 1TRACEv1 estimates of Canth inventories (in PgC uncertainties) calculated by ocean basin for the specified points in time. The Atlantic becomes the Arctic at 40° N, whereas the Pacific transitions at 67° N. The Southern Ocean is defined as the area in all basins south of 40° S. Anthropogenic inventories are small and negative in 1750 because of the -year-long period with a ppmv CO2 atmosphere prior to the industrial era.
| Year | Pacific | Atlantic | Indian | Arctic | Southern | Total |
|---|---|---|---|---|---|---|
| 1750 | ||||||
| 1800 | 0() | |||||
| 1850 | 0() | 0() | 0() | 0() | ||
| 1900 | 4() | 4() | 2() | 2() | 4() | 18() |
| 1950 | 15() | 14() | 7() | 6() | 14() | 56() |
| 1980 | 25() | 24() | 13() | 9() | 24() | 95() |
| 1994 | 35() | 32() | 17() | 11() | 32() | 127() |
| 2000 | 39() | 35() | 19() | 13() | 35() | 140() |
| 2002 | 40() | 37() | 20() | 13() | 37() | 147() |
| 2007 | 44() | 40() | 21() | 15() | 40() | 161() |
| 2010 | 47() | 42() | 22() | 15() | 42() | 168() |
| 2014 | 50() | 45() | 24() | 16() | 46() | 182() |
| 2020 | 56() | 50() | 27() | 18() | 50() | 201() |
The regional distribution of the Canth inventory qualitatively matches prior estimates as well, with significantly higher column inventory estimates in the North Atlantic (Fig. 6, Table 1). Similarly, there are areas of higher column inventories generally in the Southern Hemisphere portions of the other ocean basins, as mode and intermediate waters are exported northward from the Southern Ocean. Within Fig. 6, bathymetric features such as the Kerguelen Plateau and the Mid-Atlantic Ridge are visible when they displace waters that would otherwise contain meaningful quantities of Canth, and a band of low column inventories can be seen within the Antarctic Circumpolar Current where old deep waters upwell to near the ocean surface.
Figure 6
Column inventory of Canth mapped for 2020 using TRACEv1 with the historical/linear atmospheric CO2 pathway.
[Figure omitted. See PDF]
TRACEv1 has a more variable agreement with estimates based on transient tracer information. The estimates are higher than – but within uncertainties of – the Green function fits of Khatiwala et al. (2009) and a TTD-based inventory estimate (Waugh et al., 2006). TRACEv1 estimates of 168() PgC are within uncertainties of the 178 PgC inventory of Davila et al. (2022) calculated in 2010 using the total matrix intercomparison approach. At 172() PgC, TRACEv1 estimates are near the OCIM estimates of Devries (2014) of 160–166 PgC in 2012, although this not surprising because the estimates implied by an OCIM solution were used as a fitting parameter for TRACEv1. The TTD-based Canth inventory for 2002 in the gridded GLODAPv2 data product of Lauvset et al. (2016) is 179 PgC compared to a TRACEv1 estimate of 147(22) PgC in the same year. In Sect. S9, we show that the main disagreement between the TRACE estimates and the GLODAPv2 gridded product (Lauvset et al., 2016) is found in the deep ocean, where GLODAPv2 inventories consistently exceed TRACE inventories below m. There are several possible reasons for this disagreement, but the true cause is unclear.
3.3 Canth inventory projectionsThe Canth inventory projections (Table 2, Fig. 5b) indicate that, even if humanity acts to rapidly reduce Canth in the atmosphere and manages to bring atmospheric CO2 down to 337 atm by the middle of the millennium in line with the ambitious SSP1-1.9 scenario, the ocean will never – on this time horizon – cease to take up additional Canth, picking up an additional 5.4 PgC between 2400 and 2500. This builds on the findings of Koven et al. (2022) and Jones et al. (2016) using full model simulations through 2300 and suggests that the impacts of ocean acidification are likely to continue to spread throughout the ocean depths, even with a highly successful carbon management policy. Nevertheless, such action remains important for preventing ocean acidification, as the degree of surface and interior ocean acidification depends strongly on which SSP we follow. This is particularly true for the well-lit surface euphotic zone that is the base of most marine food webs: the relative proportion of marine Canth shifts increasingly from the surface ocean to the ocean depths over time (Fig. 7a), and this tendency becomes more pronounced the more rapidly and completely that atmospheric CO2 emissions are curtailed and reversed (Fig. 7b). Indeed, several SSPs show reduced surface Canth relative to modern values despite the continued ocean Canth accumulation. An important caveat is that these findings do not consider the impacts of changes in heat and freshwater content, circulation, or changes in the ocean's biological pump; rather, they only reflect the impact expected from changing atmospheric CO2 and the oceanic buffering capacity.
Table 2
TRACEv1 projections of global ocean Canth inventories (in PgC) until the middle of the millennium if the indicated atmospheric CO2 concentration pathway is followed.
| 2020 | 2030 | 2050 | 2100 | 2200 | 2300 | 2400 | 2500 | |
|---|---|---|---|---|---|---|---|---|
| Historical/linear | 201() | 236() | 315() | 545() | 1078() | 1623() | 2132() | 2591() |
| SSP1-1.9 | 200() | 235() | 293() | 389() | 430() | 448() | 458() | 463() |
| SSP1-2.6 | 200() | 236() | 310() | 450() | 588() | 662() | 706() | 731() |
| SSP2-4.5 | 200() | 238() | 326() | 567() | 975() | 1254() | 1417() | 1512() |
| SSP3-7.0 | 200() | 240() | 341() | 687() | 1570() | 2290() | 2730() | 2984() |
| SSP3-7.0-lowNTCF | 200() | 240() | 341() | 683() | 1559() | 2281() | 2725() | 2982() |
| SSP4-3.4 | 200() | 236() | 311() | 473() | 601() | 665() | 704() | 728() |
| SSP4-6.0 | 200() | 237() | 329() | 602() | 1123() | 1504() | 1735() | 1871() |
| SSP5-3.4-over | 200() | 240() | 346() | 535() | 663() | 721() | 754() | 772() |
Figure 7
(a) The relative inventory of Canth vs. depth in various years of the historical/linear projection, expressed as the percentage of the total DIC inventory that is found within each 1 m interval. Here, a shrinking surface value indicates that a greater proportion of the signal is found at depth, but it does not necessarily imply a lower surface Canth. Panel (b) shows the total inventory in 2500 vs. depth using solid lines for each of the CO2 concentration pathways employed by TRACEv1, with the 2020 historical/linear inventory plotted as a dashed line for comparison. Both versions of SSP3-7.0 fall nearly on top of each other on this plot and are assigned the same colors.
[Figure omitted. See PDF]
One intended use for TRACEv1 is adjusting the DIC measurement to a reference year. The simple approximation of a transient steady state (Gammon et al., 1982) has been used in several recent studies (e.g., Lauvset et al., 2016; Clement and Gruber 2018; Carter et al., 2021a; Müller et al., 2023), and our projections show that this assumption performs plausibly for projections over short timescales. However, we contend that TRACEv1 provides a superior means of adjusting DIC measurements to be appropriate for a reference year. For example, the differences between modeled Canth between 1980 and 2014 in NorESM disagree with the differences between TRACEv1 estimates for those same years by an average of mol kg−1. The statistics are worse at mol kg−1 when modeled differences are compared instead to the differences between the 1980 Canth values and the 1980 values scaled to 2014 using transient steady-state assumptions. Thus, both adjustments are reasonable from 1980 to 2014, but the transient steady-state adjustment tends to overpredict the change. Also, unlike TRACEv1, transient steady-state adjustments require an independent estimate of Canth. (For the comparison above, they were provided the exactly correct model Canth distribution in the earlier year, although this is never known in the real ocean.) Finally, the transient steady-state assumption is also known to break down if the atmosphere ceases to increase in its tracer concentration exponentially; this occurs by 2500 for CO2 in all SSPs, although most SSPs reach this point much sooner (Meinshausen et al., 2020). It can be seen in these cases that a transient steady state results in large errors in the projected Canth inventories by mid-millennium and even projects spurious decreases (Fig. 5b).
3.4 Limitations and future directionsThere are several notable limitations of the TRACEv1 method:
-
It presumes fixed circulation and is unable to resolve most timescales and modes of Canth variability.
-
It shows larger reconstruction errors in regions that lack training data, which is a common problem for neural networks and other regression strategies (e.g., Carter et al., 2021a). As transient tracer measurements with the strong SF6 constraint are still relatively rare (approximately 5 % of the GLODAPv2.2023 data product contains all three transient tracer measurements), it is likely that TRACE will improve as more such measurements become incorporated. However, version 1 of TRACE should be used with caution in regions without training data, and this caution applies to many marginal seas (Fig. 1 and Fig. S1 in the Supplement).
-
TRACEv1 appears to overestimate Canth in surface waters where there is meaningful upwelling, although perhaps not by a larger extent than alternative Canth estimation strategies. This is unfortunate because such surface waters are frequently found in areas of naturally low pH that are of interest for ocean acidification research.
-
The method has not yet been well validated in a high-resolution model representation of a coastal environment, so its uncertainties are not well estimated outside of the open ocean. While the circulation information encoded in TRACEv1 has been optimized within a limited parameter space, it is likely – based on past literature exploring many options for simplifying the complex distributions of myriad water types that mix in the ocean interior – that the comparatively simple single term that we employ herein to constrain interior ocean age distributions could be meaningfully improved. We leave this to future work.
-
Furthermore, the TTD approach is limited by the need for an assumed air–sea disequilibrium and the possibility that the degree of disequilibrium for transient tracers varies meaningfully over time and space and between CFCs and SF6 (Shao et al., 2013; Sonnerup et al., 2015) and differs from the related term for air–sea CO2 disequilibria, which seems likely due to the slow relaxation of CO2 disequilibria (Jones et al., 2014) and the faster rate of transient tracer equilibration (Wanninkhof, 2014). A common assumption of 100 % equilibration tends to result in TTD approaches overestimating Canth (Waugh et al., 2006). We include an empirical relationship intended to deal with this issue but note that its formulation remains somewhat ad hoc and based on model simulations of surface ocean conditions.
-
TRACEv1 is aimed at resolving the accumulation of Canth under steady-state circulation. However, it is possible that it is able to resolve some non-steady-state components of Canth accumulation when it is called with time-varying temperature and salinity records as predictors. It is yet untested to what degree this is an effective strategy for capturing such variability.
-
Some fitting strategies have shown improvements when the signal of interest is fit to the disagreement between observations and a model prior, instead of being fit directly to the signal. This approach could improve estimates if a model prior age distribution can be obtained and be regridded to global locations of interest in a computationally efficient manner.
-
Further optimization of the shape of the TTD could result in improved Canth reconstructions.
-
MATLAB is an open-source language, but it is not freely available. It would therefore further improve the accessibility of Canth estimates if TRACEv1 were released in a freely available computing package. Prior experience suggests that a modest amount of script is required to convert neural networks from MATLAB to Python, whereas somewhat less script is required to transition the code to Julia. This is left to future work.
4 Data availability
The gridded GLODAP product is available at
5 Code availability
TRACEv1 code can be found and freely obtained at 10.5281/zenodo.15692788 (BRCScienceProducts, 2025).
6 Conclusions
We present a new method called TRACEv1 for rapidly estimating the time-varying Canth distribution throughout the open ocean, including detailed error estimates. TRACEv1 is available as a function in the MATLAB programming language. We further provide a data product with Canth distributions for a range of years (TRACEv1_GGCanth; Carter, 2025) on the GLODAPv2 gridded product grid used by Lauvset et al. (2016). We use this data product to examine how the Canth distribution varies with depth and time, and we show that the ocean can be expected to continue to increase its Canth inventory through 2500 for all SSPs. We find that SSP3-7.0 results in the largest projected 2500 ocean Canth inventory of 2984(448) PgC, and this represents a -fold increase over the 2020 Canth inventory.
There are several strengths of the TRACEv1 method, which relies on TTDs to estimate Canth distributions from a time-evolving atmospheric CO2 trajectory. The method is easy and quick to implement, shows fidelity to model reconstructions and agreement with recently published data-based estimates, and only requires and measurements and spatiotemporal coordinate information to produce an estimate. It also provides means to plausibly adjust collections of DIC measurements collected over time to a common time by removing the influences of Canth changes. While the reconstruction fidelity of TRACEv1 estimates were quite high in a test using model output with exactly known Canth distributions, we nevertheless believe the primary advantage of TRACEv1 and the new data product is their accessibility.
The supplement related to this article is available online at
Author contributions
BRC led the conceptualization of the work, coding, drafting of the manuscript, and generation of the data products. All authors aided with the conceptualization and provided edits and feedback on drafts of the manuscript. JS also generated simulated fields of CFCs and SF6. JDS, DES, JES, and LMD tested portions of the code.
Competing interests
The contact author has declared that none of the authors has any competing interests.
Disclaimer
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Acknowledgements
The authors wish to thank the principal investigators and analysts who have spent decades laboring to collect the transient tracer information that makes TRACEv1 and the data product possible. Brendan R. Carter is grateful to the US National Science Foundation (NSF) for the funds to develop TRACEv1 under “Collaborative Research: Improving the accuracy and uncertainty associated with estimated CO2 from pH sensors on autonomous profiling platforms” (grant no. 2048509). The data synthesis efforts, comparisons to alternative estimates, and development of the preformed property neural networks were performed with the support of the Global Ocean Monitoring and Observing program of NOAA through funding to the Carbon Data Management and Synthesis Program (grant no. 100007298). Larissa M. Dias's contributions were supported by the Biogeochemical-Argo algorithms grant from NOAA's Climate Variability and Global Ocean and Monitoring Observation Programs (grant no. NA21OAR4310251). Daniel E. Sandborn is grateful to the NSF Division of Ocean Sciences (OCE) for support through the award entitled “Collaborative Research: US GO-SHIP 2021-2026 Repeat Hydrography, Carbon, and Tracers” (grant no. OCE-2023545). The Pacific Marine Environmental Laboratory (PMEL) and Cooperative Institute for Climate, Ocean, and Ecosystem Studies (CICOES) contributions are numbers 5695 and 2024-1424, respectively. Andrea J. Fassbender's contribution was supported by PMEL, Rolf Sonnerup's contribution was supported by NSF award no.1634256, and Jonathan D. Sharp's contribution was supported by PMEL and CICOES under cooperative agreement no. NA20OAR4320271.
Financial support
This research has been supported by the National Oceanic and Atmospheric Administration (grant nos. 100007298 and NA21OAR4310251), the Directorate for Geosciences (grant no. 2048509), the National Oceanic and Atmospheric Administration (grant nos. 100007298 and NA21OAR4310251), and the US National Science Foundation (grant nos. 2048509 and OCE-2025535).
Review statement
This paper was edited by Sebastiaan van de Velde and reviewed by Scott C. Doney and Toste Tanhua.
© 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.