Full Text

Turn on search term navigation

Introduction

Estimates of global mean surface temperature anomalies (GMST), derived from a combination of near‐surface air temperatures from land stations and sea‐surface temperatures over oceans, have long been a staple of climate study. GMST and derived trends or changes, ΔGMST, have featured prominently in IPCC reports, and are a key component in assessments of climate change attribution (Bindoff et al., 2013), climate model validation (Flato et al., 2013), global carbon budgets (Rogelj et al., 2018) and climate impacts (Hoegh‐Guldberg et al., 2018). Perhaps most importantly, the IPCC's long‐term ΔGMST estimate of 0.85°C, based on the 1880–2012 linear trend, was a key scientific input to the Paris agreement to keep global surface temperature change well below 2°C (IPCC, 2014; UNFCCC, 2015).

The IPCC Fifth Assessment Working Group I Report (IPCC WG1 AR5; Hartmann et al., 2013a) used three GMST data sets: HadCRUT4 (Morice et al., 2012), NASA GISTEMP (Hansen et al., 2010), and NOAA MLOST (Vose et al., 2012). While HadCRUT4 begins in 1850, the NOAA and NASA data sets only begin in 1880, so the 1880–2012 ordinary least squares (OLS) linear trend was presented as a “headline” warming estimate along with the HadCRUT4 1850–1900 to 2003–2012 difference in the Summary for Policymakers (IPCC, 2013). OLS trends for all data sets were also given for 1951–2012 and 1979–2012 with uncertainties adjusted to account for autocorrelated residuals (Hartmann et al., 2013b; Santer et al., 2008).

The IPCC Special Report on Global Warming of 1.5°C (IPCC SR1.5; Allen et al., 2018) included two new GMST data sets that incorporated sophisticated spatial interpolation: Cowtan‐Way (Cowtan et al., 2015; Cowtan & Way, 2014a, 2014b) and Berkeley Earth (Rohde et al., 2013). Reported ΔGMST was 0.87°C ± 0.12°C based on the average of HadCRUT4, NOAA, NASA and Cowtan‐Way. An observation based estimate of Global Surface Air Temperature change (ΔGSAT) was introduced by adjusting HadCRUT4 ΔGMST to account for incomplete coverage and discrepancy in ocean air and sea‐surface temperature anomalies, thus producing an estimate of near‐surface air temperature at 2 m over the entire globe (Cowtan et al., 2015; Rogelj et al., 2018). The ΔGSAT estimate of 0.97°C in 2006–2015 implied lower remaining carbon budgets compared to preceding studies based on ΔGMST consistent with AR5's 0.85°C through 2012 (Goodwin et al., 2018; Millar et al., 2017, 2018; Richardson et al., 2018).

IPCC WG1 AR5 Box 2.2 discusses the following issues with linear trends for estimating ΔGMST: (1) poor approximation of trend evolution over time; (2) poor fit of residuals unamenable to correction via autoregressive or moving average models; (3) high sensitivity to selected period; and (4) divergent or even contradictory subperiod estimates relative to that of a larger encompassing interval. The latter two issues were particularly relevant in AR5 Section 2.4.3's discussion of the “observed reduction in warming trend” over 1998–2012 compared to 1951–2012 (Rahmstorf et al., 2017; Risbey et al., 2018). A smoothing spline nonlinear trend fit was demonstrated to address these factors, and later studies presented alternative estimators for continuous long‐term ΔGMST trends (Cahill et al., 2015; Mudelsee, 2019; Peng‐Fei et al., 2015; Visser et al., 2018).

An issue of particular concern is that linear trends underestimate long‐term (>100 years) ΔGMST compared to other estimates. For example, IPCC AR5 Box 2.2 estimated HadCRUT4 1900–2012 trends of 0.075 ± 0.013°C decade⁻¹ and 0.081 ± 0.010°C decade⁻¹ for linear OLS and smoothing spline trends respectively. Generally, long‐term linear fit ΔGMST from 1880 to present is 0.05°C–0.10°C below nonlinear estimates (SR15 Table 1.2; Visser et al., 2018) although the spread in ΔGMST estimates between different data sets is commonly as wide as differences engendered by ΔGMST methodology. Ultimately, IPCC AR5 Box 2.2 recommended linear trends over nonlinear estimates, noting that HadCRUT4 OLS‐based long‐term ΔGMST lay within the 5%–95% uncertainty range from the smoothing spline. Nevertheless, as the IPCC enters the Sixth Assessment Report (AR6), a new method that supplements or supplants the traditional approaches could reduce known biases and address these shortcomings.

This work proposes a local regression technique (LOESS; Cleveland, 1979; Cleveland et al., 1992) with a ±20 year smoothing window for multidecadal analysis. We also provide statistical uncertainty and show that the fit residuals follow the assumed ARMA(1, 1) autocorrelation structure. The framework can be extended to give self‐consistent ΔGMST estimates with uncertainty over as little as 15 years, providing a potential alternative to linear fits over all intervals of interest.

However, here we focus on long‐term ΔGMST and associated carbon budgets, directly relating our estimates to approaches discussed in AR5 and SR1.5. We compare against the IPCC approaches of OLS (1880–latest year) and period mean differences (from “preindustrial” reference period 1850–1900 to the latest decade), as well as a global warming index which SR1.5 used as the main estimate of “human‐induced warming” (Haustein et al., 2017). We also test the performance of our LOESS estimates using output from the two model large ensembles with simulations that begin in 1850. Our final comparison is with the new CMIP6 model ensemble, and using a subset of this ensemble we derive a modest conversion factor to update our observation‐based ΔGMST to ΔGSAT for carbon budget calculations.

The study is structured as follows. Section 2.2 describes source data from observations and associated estimated radiative forcings (2.2.1), two large model ensembles (2.1.2) and CMIP6 models (2.1.3). Section 2.2 describes trend estimation (2.2.1), evaluation of ΔGMST methods and performance (2.2.2), large model ensemble evaluation (2.2.3) and ΔGSAT and carbon budget calculation (2.2.4). We present our results in Section 3, covering long‐term ΔGMST analysis (3.1), large model ensemble analysis (3.2) and ΔGSAT and associated remaining carbon budgets (3.3). Finally in Section 4, we discuss our results and issue recommendations for the use of ΔGMST and ΔGSAT in future IPCC assessments.

Source Data and Methods

Source Data

Global Surface Temperature Data Sets

Typically, gridded monthly land surface air temperature (LSAT) and sea‐surface temperature (SST) anomalies are generated then blended to produce GMST. Table 1 summarizes five blended LSAT‐SST series in widespread use. There is considerable overlap in the underlying data sets. There are two SST data sets: HadSST3 (Kennedy et al., 2011a, 2011b) and NOAA's ERSSTv5 (Huang et al., 2017), and three LSAT data sets: GHCNv4 (Menne et al., 2018), CRUTEM4 (Jones et al., 2012), and Berkeley Earth (Rohde et al., 2013). Even this understates the overlap; for example, both SST data sets rely primarily on the comprehensive store of maritime observations from the International Comprehensive Ocean‐Atmosphere Data Set (ICOADS, Freeman et al., 2016), albeit processed, filtered and supplemented in different ways. It is important to note, however, that there are important differences between each group's quality assurance and data homogenization procedures, and associated uncertainties, in both the land and SST data sets. In particular, bias adjustments of SST data to account for differences between buoy, engine intake and bucket measurements, can have a notable effect on long‐term trends (Kennedy et al., 2019).

1 TableFive Operational Observational Data Sets

Series	Land (LSAT)	Ocean (SST)	Interpolation	Averaging	Start year
HadCRUT4 (Morice et al., 2012)	CRUTEM4	HadSST3	None	Simple average of hemispheric area‐weighted averages	1850
NOAA GlobalTemp v5 (Huang et al., 2020; Zhang et al., 2019)	GHCNv4	ERSSTv5	Empirical orthogonal teleconnections (EOTs)	Area weighted average	1880
NASA GISTEMP v4 (Lenssen et al., 2019)	GHCNv4	ERSSTv5	Distance weighting (to 1,200 km)	80 zones x 100 subboxes	1880
Cowtan‐Way v2 (Cowtan & Way, 2014a, 2014b; Cowtan et al., 2015)	CRUTEM4 (kriged)	HadSST3 (kriged)	Kriging (Complete)	Area weighted average	1850
Berkeley Earth (Rohde & Hausfather, 2020)	Berkeley Earth	HadSST3 (reprocessed and kriged)	Kriging (to ∼2,500 km)	Area weighted average	1850

Differences in spatial interpolation can affect calculated GMST. HadCRUT4 calculates area‐weighted hemispheric means with no interpolation between its 5° × 5° grid boxes, combined in a “simple” (equally weighted) average. In contrast, NASA GISTEMP, Cowtan‐Way and Berkeley Earth use extensive interpolation and, crucially, extrapolate LSAT over sea ice. Cowtan‐Way interpolates HadCRUT4 to produce 100% apparent coverage, while GISTEMP and Berkeley Earth both interpolate up to 1,200 km from observations, resulting in virtual areal coverage two to three times that of HadCRUT4 in the late 19th century. Nominal coverage in all three data sets is virtually complete since 1951 (see Figure S1). Reducing Cowtan‐Way coverage to that of Berkeley Earth results in imperceptible differences in GMST even in the 19th century, indicating that distance‐limited and unlimited kriging interpolation can be considered equivalent (see Figure S14). Spatial smoothing via empirical orthogonal teleconnections (EOTs; van den Dool et al., 2000) in NOAA GlobalTemp (and ERSSTv5) results in nominal coverage between that of HadCRUT4 and NASA GISTEMP, but largely misses very high latitudes and has no interpolated coverage over Arctic sea ice.

Comparisons with temperature reanalyzes, independent surface data and satellite retrievals show that interpolation significantly mitigates coverage bias (and associated underestimation of warming) arising from poor sampling of the fastest warming areas, especially the Arctic, since the mid‐twentieth century (Cowtan et al., 2018; Dodd et al., 2015; Lennsen et al., 2019; Susskind et al., 2019). Evidence is mixed for earlier periods where reduced coverage leads to larger interpolation uncertainty (Cowtan, et al., 2018) and differences between underlying SST data sets are the largest source of discrepancies. Cowtan et al. (2018) showed that both generalized least squares averaging and kriging interpolation mitigated errors engendered by “naïve” global or hemispheric averaging methods, such as those used in HadCRUT4, which implicitly set “missing” areas to the global average of sampled areas (Hansen et al., 2006). Thus, the three interpolated data sets are demonstrably more representative of global climate change.

We use the published monthly anomaly series, except for Berkeley Earth where we use the area‐weighted average of the gridded series, which diverges from the published series over 1850–1950 (Figures S2 and S3). For series starting in 1850 anomalies are relative to 1850–1900 while NASA GISTEMP and NOAA GlobalTemp are baselined such that their 1880–1900 mean matches that of the three longer‐running data sets. These rebaselined NASA and NOAA series are used for all ΔGMST estimates calculated relative to 1850–1900 as outlined in Section 2.2.1. This streamlined and consistent scheme replaces multiple IPCC SR1.5 approaches based on scaling their 1880–2015 trends or matching to HadCRUT4 over 1880–1990. We also report the mean ΔGMST for all five operational data sets (OpAll group) and the subset of three data sets with near‐global interpolated coverage post‐1950 (Global_3 group), with the latter used as the basis for our main estimates. Group ΔGMST estimates are the mean of the individual estimates as in IPCC AR5.

We augment temperature data with summarized anthropogenic and natural radiative forcing data required to derive the “global warming index” referenced in SR1.5 as a potential alternative to ΔGMST for tracking anthropogenic warming (Allen et al., 2018; Haustein et al., 2017). These are used to estimate anthropogenic and natural forced changes, ΔGMST_F,anthro and ΔGMST_F,nat, using a two‐box impulse‐response model with parameters derived from a least‐squares‐fit between observed temperatures and the modeled response (Haustein et al., 2017; Otto et al., 2015). These estimates are used to assess the characteristics of a particular LOESS window choice (Section 2.2.1) and as an additional comparator to long‐term ΔGMST.

Model Large Ensembles

We perform tests using output from the large ensembles whose simulations begin in 1850: the Max Planck Institute for Meteorology Grand Ensemble (MPI‐GE, N = 100, Maher et al., 2019) and Commonwealth Scientific and Industrial Research Organisation Mk3.6.0 (CSIRO Mk3.6.0, N = 30; Jeffrey et al., 2013; Rotstayn et al., 2012), taking their GSAT over historical‐RCP8.5 simulations for 1850–2019 and baselining each to 1850–1900. We exclude five other large ensembles that start after 1850 (Deser et al., 2020), and our approach is conceptually similar to that in Dessler et al. (2018)'s estimation of how internal variability affects derived climate sensitivity in MPI‐GE. The use of GSAT simplifies the calculations and since the year‐to‐year variability in GSAT‐GMST difference is of order 0.01°C in CMIP5 models (e.g., Figure 2 of Cowtan et al., 2015), we expect little effect of blending or masking on this particular analysis.

Conceptually, we first decompose ΔGSAT as: [Image Omitted. See PDF]where ΔGSAT_var represents internal variability and ΔGSAT_F the forced response. The same decomposition would apply for ΔGMST. We adopt the IPCC SR1.5 argument that “[s]ince 2000, the estimated level of human‐induced warming has been equal to the level of observed warming with a likely range of ±20%.” From this it follows that a reliable estimate of ΔGMST_F through 2019 would be an appropriate estimate of human‐induced warming, ΔGMST_F,anthro, with relevance for temperature targets and carbon budgets. With just one realization of real‐world internal variability we cannot perform this decomposition, but a large ensemble mean should approach that model's ΔGMST_F. We test whether our derived ΔGMST_LOESS approximates ΔGMST_F, and consider the decomposition in an individual run to be: [Image Omitted. See PDF]with a ±20‐year window this effectively decomposes between short‐ and long‐term ΔGMST. If periods are selected to minimize volcanism (which induces short‐term ΔGMST_F), and the magnitude of ΔGMST_var is small at 40‐year timescales, then resultant ${GMST}_{LOESS} \approx {GMST}_{F,anthro}$ over the long‐term intervals of interest.

Coupled Model Intercomparison Project, Phase 6 (CMIP6) Output

We include historical simulations over 1850–2014 from CMIP6 models which have the required fields for blending surface air temperatures (SAT) over land or sea ice and SST over ocean (Eyring et al., 2016), permitting “apples‐to‐apples” comparisons with land‐ocean observational data sets and derivation of a ΔGMST_LOESS to ΔGSAT_LOESS adjustment. These include near‐surface air temperature (“tas”), sea‐surface temperature (“tos”) and sea ice concentration (“sciconc” or “sciconca," N = 24 simulations listed in Table S1).

Following Cowtan et al. (2015) and Richardson et al. (2018), each simulation is processed to produce two area‐weighted average series: (1) global SAT (i.e., GSAT) and (2) global blended SAT‐SST (i.e., GMST). At each grid cell i, j, the blended monthly temperature T_blend,i,j is: [Image Omitted. See PDF]where w_SAT,i,j is the land plus sea ice grid cell fraction, and T_SAT,i,j and T_SST,i,j are the local anomalies relative to 1850–1900. For GSAT w_SAT,i,j = 1 everywhere, and for the blended GMST series w_SAT,i,j = 1 in ocean cells for a calendar month if any those months during 1961–2014 has siconc >1%. This is similar to the Cowtan‐Way blending algorithm and the “xaf” simulations in Cowtan et al. (2015).

Methods

Next we describe our approach to obtain ΔGMST, our uncertainty estimation, and the remaining carbon budget calculation. Section 2.2.1 explains the trend fits and errors; Section 2.2.2 explains the ΔGMST calculations, observational error and methods by which the fit quality are judged using observational data. Section 2.2.3 discusses the large ensemble methodology, and Section 2.2.4 the CMIP6 comparison and carbon budget calculation. We use ΔGMST and ΔGSAT to refer to a general change in global temperature, and use qualifiers or subscripts when referring to statistical estimation methods or its components. For example, LOESS_bsln ΔGMST (or ΔGMST_LOESS) refers to an estimate made with LOESS, while ΔGMST_F refers to the forced component.

Trend Calculations and Their Statistical Uncertainty

For a series of n temperature observations x_i at time t_i, a linear trend is: [Image Omitted. See PDF]where a and b are intercept and slope parameters to be fitted and e_i are residual errors. The slope estimate $\hat{b}$ is used to obtain ΔGMST as $\hat{b}$ (t_n − t_i), with the uncertainty of $\hat{b}$ (and thus ΔGMST) determined as explained below.

Our multidecadal LOESS point‐to‐point (LOESS_md) ΔGMST is based on the LOESS fit from 1880 to 2019; for any starting point, ΔGMST to 2019 is the LOESS_md fit evaluated in 2019 minus the start value. We also introduce “baseline” LOESS (LOESS_bsln) as our main ΔGMST estimate. LOESS_bsln is simply the same fit evaluated at the end year, yielding an estimate relative to 1850–1900 baseline, rather than to a given start year such as 1880. Although the central estimated fit is the same, the associated statistical fit uncertainties are quite different, as explained below.

Our LOESS_md uses a fixed span α_md of ±20 years, tricube weighting (the default) and a degree 1 smoothing parameter (i.e., locally weighted linear trend, which yields more stable end points). Tests with the Cowtan‐Way series show that α of ±10 years captures internal decadal variability and has marked sensitivity to volcanic episodes early in the record and to a lesser extent over 1980–2019 (Figure S4). On the other hand, α of ±20 or ±30 years smooth out short‐term variability and show similar warming from 1850 to 1900 to present: 1.12°C (±20 years) or 1.11°C (±30 years). Analysis of first differences for each LOESS window (Figure S5) shows large variance with α of ±5 years, which stabilizes with α of ±20, ±25 or ±30 years. Large ensemble tests support this choice: α_md substantially smaller than ±20 years increases ΔGMST_F discrepancy, while substantially longer than ±20 years introduces a low bias in 1850–2019 ΔGMST (Figures S6 and S7). We therefore choose α_md = ±20 years to evaluate trends of length ≥30 years; LOESS_pent (α = ±5 years) is reserved for future extension of our framework to cover very short‐term trends of ≤15 years (see Figure S4d).

Default methods assume statistically independent noise, necessitating an uncertainty correction if the fit residuals are autocorrelated. Santer et al. (2008) presented a procedure for assessing an effective sample size (and associated reduction in degrees of freedom) from the general formula [Image Omitted. See PDF]where $ρ_{j}$ is the autocorrelation function of a noise model estimated from the fit residuals. If the noise follows an autoregressive(1) (AR(1)) process, then with $ρ_{j} = ϕ^{j}$ . [Image Omitted. See PDF]where $ϕ$ is estimated from the lag‐one autocorrelation coefficient (Mitchell et al., 1966). However, Foster and Rahmstorf (2011) demonstrated that 1979–2010 GMST trend residuals were more consistent with an autoregressive moving average, ARMA(1, 1) model in the form [Image Omitted. See PDF] [Image Omitted. See PDF]

Substituting Equation 7 into Equation 6 yields [Image Omitted. See PDF]

Foster and Rahmstorf used the Yule‐Walker “method of moments” with $\hat{ϕ} = {\hat{ρ}}_{1} / {\hat{ρ}}_{2}$ . Hausfather et al. (2017) instead used Maximum Likelihood Estimation (MLE) to obtain $\hat{ϕ}$ and $\hat{θ}$ and then ${\hat{ρ}}_{1}$ via Equation 6. Monte Carlo simulations show that MLE gives a more robust and efficient estimator $\hat{ϕ}$ , suitable for series as short as 8 years (see Figure S8). Hausfather et al. (2017) also introduced a bias correction to account for underestimated autocorrelation in shorter series, derived from AR(1) in Tjøstheim and Paulsen (1983) and extended to account for the positive difference between $\hat{ϕ}$ and ${\hat{ρ}}_{1}$ . [Image Omitted. See PDF]

Although this bias correction is most pertinent for very short series, Monte Carlo simulations have demonstrated its relevance for highly autocorrelated series up to 720 months in length. We selected this bias correction after comparison with alternatives (e.g., Nychka et al., 2000; see Figure S9).

Substituting the bias corrected parameters and simplifying the correction term as in Equation 5 yields the final effective length correction. [Image Omitted. See PDF]

We estimate corrections from the residuals of both LOESS and OLS. To apply this correction, we define nominal degrees of freedom v = n_t − p and effective degrees of freedom v_e = n_e − p, where p is the number of actual or equivalent parameters of the trend fitting methodology.

In the linear case, the correction is applied directly to s_b, the standard error of b in Equation 4, with p = 2. [Image Omitted. See PDF]

For nonparametric trend estimation like LOESS, Monte Carlo simulations can establish uncertainties, as in Visser et al (2018) for smoothing spline trends. Here, we propose a plausible heuristic method. First the above correction is applied to s_e, the standard errors of the residual fit, with p set to the equivalent number of parameters of the LOESS trend, derived from the trace of the LOESS projection matrix (Cleveland & Grosse, 1991); generally p ≈ 2/α + 0.5 for GMST data sets. For an equally spaced time series, s_e is maximum at the start and end of the LOESS fit. If statistical errors at these two points are independent, they may be combined in quadrature, by taking the square root of the sum of the squared standard errors, i.e. the square root of the sum of variances (see also Eq S4 in Karl et al., 2015). Then the corrected standard error $s_{Δ T_{n}}^{'}$ for ΔGMST_n becomes [Image Omitted. See PDF]

For both OLS and LOESS_md we evaluate the sample autocorrelation function (ACF) of the fit residuals as well as the ACFs of the ARMA(1, 1) and AR(1) noise models fit to those residuals.

Finally, for LOESS_bsln we assume that the mean error during the 1850–1900 baseline is small relative to the end point error. We are not aware of any formal method for calculating the required adjustment, so we generate an ad hoc correction tuned to perform well in Monte Carlo tests. To approximate the baseline uncertainty, we take the LOESS_md start point uncertainty, $\max (s'_{e})$ , and reduce it according to the relative length of the baseline by applying an appropriate factor b_adj. This is similar in principle to the reduction of sample mean uncertainty with increasing sample size; in this case, b_adj is tuned to reproduce the results of Monte Carlo tests with Cowtan‐Way data. For a baseline t₁ to t_b, with b ≤ n/2, where n is the length of the full series we take (while also imposing a lower limit on b_adj): [Image Omitted. See PDF]

Following quadrature the combined LOESS_bsln error is then: [Image Omitted. See PDF]and Equation 12 is a special case of Equation 14 with a baseline of length 0 and b_adj = 1. Monte Carlo simulations of LOESS fits plus ARMA(1, 1) noise produce a probability distribution function nearly identical to that engendered in Cowtan‐Way by Equation 12 over 1880–2019 and by Equation 14 from 1850–1900 and 1880–1900 to 2019 (Figures S10 and S11).

Estimates of Observational ΔGMST, Error Components and Performance Tests

The main analysis focuses on long‐term ΔGMST (results for other IPCC AR5 periods are in Table S2). In addition to OLS and LOESS_md ΔGMST over 1880–2019, and LOESS_bsln from 1850–1900 to 2019, we also calculate period difference ΔGMST estimates by subtracting mean GMST over 1850–1900 from the most recent decade, 2010–2019. The above are also compared to GMST‐derived estimates of anthropogenic warming (Haustein et al., 2017; Section 2.1.2) and to a CMIP6 ensemble (Section 2.2.4). Global_3 and OpAll group ΔGMST are the mean of individual data set ΔGMST.

Following standard IPCC practice, we report the 5%–95% statistical uncertainty range for LOESS and OLS ΔGMST estimates, as outlined in Section 2.2.1. Group uncertainties are reported conservatively and go from the smallest 5% to the largest 95% reported for any of their constituent data sets. We also report observational parametric uncertainty as the 5%–95% range of ΔGMST values derived from each of the 100‐member HadCRUT4 and Cowtan‐Way ensembles. These ensembles use a Monte‐Carlo method to assess the fully correlated errors engendered by parametric uncertainty related to bias adjustments to individual temperature readings (Kennedy et al., 2011).

Figure S12 depicts these estimates and derived autocorrelation functions (ACF) for the Cowtan‐Way monthly series with ARMA(1, 1) correction and for Cowtan‐Way annual series with AR(1) correction (similar to IPCC AR5).

Finally we assess LOESS_bsln ΔGMST against period mean differences for the Global_3 group by evaluating at the mid‐point of the corresponding end decade; for example, LOESS_bsln at the end of 2014 is comparable to the 1850–1900 to 2010–2019 period ΔGMST. IPCC SR1.5 explicitly considered their 1850–1900 to 2006–2015 ΔGMST estimate to be a proxy of the eventual 1996–2025 mean. We therefore compare the ΔGMST estimates for every year from 1995 against centered 20‐year and 30‐year means. We also compare to “extended” running 30‐year periods, generated by assuming a continuation of the 1990–2019 linear trend through 2029. We argue that a smaller bias and root mean square error (RMSE) relative to the 20‐ and 30‐year means represents better performance according to the IPCC's own criterion.

Large Ensemble Analysis for Method Validation and Uncertainty Calculation

LOESS_bsln is fit to the 1850–2019 annual output for each simulation, then the ΔGMST_LOESS through 2019 is evaluated from all start years 1850–1980. Separate linear OLS fits ending in 2019 are also obtained for those start years. We also evaluate LOESS_bsln at the end of 2014 and compare with the 1850–1900 to 2010–2019 period ΔGMST (which we henceforth refer to as ΔGMST_period). Finally, LOESS_md is calculated over 1880–2019 for each simulation. The distribution of ensemble member ΔGMST‐ΔGMST_F provides an estimate of the bias and uncertainties for each estimator and each period, as argued in Section 3.2. If ΔGMST_LOESS ≈ ΔGMST_F then the LOESS residuals will be dominated by internal variability and our statistical uncertainty is related to error due to internal variability (we confirmed that the model residuals generally follow our assumed ARMA(1,1), Figure S13). The LOESS decomposition filters in time: ΔGMST_F excursions shorter than our window will inflate statistical uncertainty, while multidecadal ΔGMST_var changes will be included in ΔGMST_LOESS and result in too small errors. We compare each run's statistical uncertainties with the ensemble 17%–83% and 5%–95% ranges to check for evidence that the observation‐derived statistical uncertainties could represent internal variability in the 1850–1900 to 2019 ΔGMST_LOESS used for carbon budget calculations (see Section 2.2.4).

CMIP6 Comparisons, GSAT Adjustment and Remaining Carbon Budget

IPCC SR15 reported remaining carbon budgets accounting for warming to date, but did not directly use the reported ΔGMST_period 5%–95% observational uncertainty from individual data sets. Instead AR5 5%–95% observational uncertainty through 1986–2005 was combined with additional uncertainties to produce a “likely” 17%–83% ΔGMST total uncertainty, and ΔGMST_period was then converted to ΔGSAT_period using a CMIP5‐derived scaling. This Section describes the comparison with CMIP6 ΔGMST_period and conversion of observed LOESS_bsln ΔGMST to ΔGSAT, and then details the carbon budget calculation, which largely follows the IPCC SR1.5 methodology, as elaborated by Rogelj et al. (2019).

LOESS_bsln series are generated for each of the 24 individual full‐coverage CMIP6 air‐only (GSAT) and blended (GMST) series described in Section 2.1.3, with the blended series being comparable to quasi‐global GMST observations. We consider the full ensemble and also a sub ensemble of “likely ECS” models, excluding those with effective climate sensitivity (ECS) outside the CMIP5 1.9°C–4.5°C 90% ensemble range (Flato et al., 2013; Forster et al., 2019).

For each ensemble member's LOESS_bsln changes we derive a “blending” factor A_blend = ΔGSAT_LOESS/ΔGMST_LOESS, which represents the required adjustment to convert ΔGMST_LOESS to ΔGSAT_LOESS, accounting for the difference between GSAT air temperatures and GMST “blending” of air and water temperatures. The median and ensemble distribution of A_blend scaling factors is applied to observed ΔGMST_LOESS to obtain historical observed ΔGSAT_LOESS with combined uncertainty for calculating the remaining carbon budget, as detailed below. The carbon budget calculation largely follows the framework established in IPCC SR1.5 (Rogelj et al., 2018), elaborated by Rogelj et al. (2019) and implemented by Nauel et al. (2019). We simplify the Rogelj et al. (2019) remaining carbon budget equation to: [Image Omitted. See PDF]where B_lim is the remaining carbon budget associated with a temperature limit ΔGSAT_lim (1.5°C or 2°C), with ΔGSAT_F,anthro (also referred to as ΔGSAT_hist) the historical human‐induced warming to date and ${GSAT}_{{nonCO}_{2},fut}$ the expected future warming from nonCO₂ anthropogenic forcing. TCRE is the transient climate response to cumulative CO₂ emissions, while E_Esfb is an adjustment for Earth system feedbacks from permafrost thaw and warming wetlands. This is essentially the same framework as SR1.5, except that in SR1.5 nonCO₂ warming was not separate, but rather included in TCRE, and the earth‐system feedback adjustment was incorporated in the results of SR1.5 Table 2.2, but not included in “headline” estimates in its Summary for Policymakers (IPCC, 2018).

In practice, observations based ΔGSAT_obs (whether ΔGSAT_period, ΔGSAT_LOESS or using another statistical technique) is used as an approximation of ΔGSAT_F,anthro, following from the finding that observed and “human‐induced” warming to date are approximately equivalent (Allen et al., 2018; Haustein et al., 2017). Thus, SR15 assessed ΔGSAT_F,anthro as 0.97°C in 2006–2015 relative to 1850–1900, based on the HadCRUT4 average for that decade (0.84°C) adjusted by the ratio between the equivalent CMIP5 blended‐masked estimate (0.86°C) and CMIP 5 ΔGSAT (0.99°C), as stated in Box 2 of Rogelj et al. (2019).

Here, we select the Global_3 GMST group and so do not need to rely on a model correction for the additional bias introduced by HadCRUT4's incomplete and changing geographic coverage, which necessitates a correction substantially larger than A_blend. Our central estimate for ΔGSAT_F,anthro is: [Image Omitted. See PDF]where A_{blend_med} is the median value from CMIP6 A_blend ensemble and ΔGMST_{Global_3} is the LOESS_bsln ΔGMST of the Global_3 group (based on the mean of LOESS_bsln applied to each of the three series). It should be noted this is a very conservative adjustment, as it may not fully account for coverage bias in the early part of the instrumental record, and ignores the “ice edge effect” cooling bias introduced by the variable sea ice mask in NASA GISTEMP and Berkeley Earth, which would add an additional ∼3% (Cowtan et al., 2015; Richardson et al., 2018).

SR1.5's likely total uncertainty in ΔGMST_obs (and derived ΔGSAT) was ±0.12°C. Here, we derive likely observation‐based ΔGSAT_LOESS using Gaussian approximations to the observational, data set spread and statistical fit uncertainties in the following steps (tests and details in Table S3):

The Cowtan‐Way ensemble spread is our best estimate of observational parametric ΔGMST uncertainty, so for each data set its standard deviation is combined in quadrature separately with (i) the data set‐specific statistical 1σ uncertainty and (ii) the CSIRO Mk3.6.0 large ensemble standard deviation
For ΔGSAT, the CMIP6 A_blend ensemble standard deviation is taken as the uncertainty value, and combined in quadrature with the results of 1.
We estimate a 17%–83% range by calculating those percentiles for each data set following a Gaussian assumption, that is, ±0.954σ from the mean, and then selecting the lowest 17% and higher 83% value from across the data sets

There is no universally accepted method of accounting for data set spread. We adopt step 3 as a conservative approach, however, by reporting the separate data set uncertainties as described in Section 2.2.2 other groups can replicate or develop alternative uncertainty estimates.

We take Rogelj et al. (2019)'s, $T_{{nonCO}_{2}}$ of 0.1°C (0.2°C) for T_lim of 1.5°C (2°C), and E_Esfb of 100 Gt CO₂ through 2,100. TCRE percentiles are based on AR5's likely range of 0.2°C–0.7°C per 1,000 Gt CO₂ (Collins et al., 2013), as in Nauels et al (2019). SR1.5 included alternative carbon budgets using a lower T_hist from the average of the blended GMST data sets with no GSAT adjustment. Our alternative uses the Global_3 average without the GSAT adjustment. To contextualize the remaining budget against cumulative emissions to date we include data and uncertainties from the 2019 Global Carbon Budget (Friedlingstein et al., 2019).

Results

Long Term ΔGMST Analysis

Figure 1 compares LOESS_md and OLS ΔGMST from 1880 to 2019 with associated 5%–95% uncertainties (Figure 1a). Figure 1b shows that the LOESS fit residuals follow our assumed ARMA(1, 1), which is necessary to justify our error correction and is not true for OLS (Figure 1c). Our full set of observational long‐term ΔGMST estimates are given in Table 2.

View Image - 1 Figure. 1880–2019 warming estimates from five GMST series. (a) LOESS (span ± 20 years) and OLS trends with 5%–95% statistical fit uncertainty are shown for Cowtan ‐Way (purple), NASA GISTEMP (blue), Berkeley Earth (orange), NOAA GlobalTemp (light blue) and HadCRUT4 (red) over 1880–2019. (b) The autocorrelation function (ACF) of the LOESS fit residuals are shown for each series (solid lines), along with the ACF of the estimated ARMA(1, 1) model used to correct for autocorrelation. (c) As in (b) except for OLS linear trend. GMST, Global mean surface temperature; OLS, Ordinary least squares.

1 Figure. 1880–2019 warming estimates from five GMST series. (a) LOESS (span ± 20 years) and OLS trends with 5%–95% statistical fit uncertainty are shown for Cowtan ‐Way (purple), NASA GISTEMP (blue), Berkeley Earth (orange), NOAA GlobalTemp (light blue) and HadCRUT4 (red) over 1880–2019. (b) The autocorrelation function (ACF) of the LOESS fit residuals are shown for each series (solid lines), along with the ACF of the estimated ARMA(1, 1) model used to correct for autocorrelation. (c) As in (b) except for OLS linear trend. GMST, Global mean surface temperature; OLS, Ordinary least squares.

2 TableObserved Increase in GMST (°C) in Data Sets and Data Set Groupings

Period:	1850–1900 to 2019	1850–1900 to 2010–2019	1880–2019
LOESS_bsln	Latest decade	LOESS_md	Linear
HadCRUT4	1.02 [0.93–1.11] (0.97–1.07)	0.93 (0.88–0.98)	0.99 [0.88–1.11] (0.94–1.04)	0.96 [0.82–1.10] (0.92–1.03)
NOAA GlobalTemp	1.09 [0.98–1.19]	0.99	1.06 [0.93–1.18]	1.04 [0.89–1.19]
NASA GISTEMP	1.12 [1.02–1.22]	1.01	1.09 [0.98–1.21]	1.04 [0.88–1.20]
Cowtan & Way	1.12 [1.04–1.21] (1.05 – 1.19)	1.01 (0.95–1.09)	1.14 [1.03–1.25] (1.08–1.21)	1.02 [0.88–1.15] (0.94–1.09)
Berkeley Earth	1.19 [1.10–1.27]	1.08	1.20 [1.09–1.31]	1.09 [0.96–1.22]
All Operational (OpAll)	1.11 [0.93–1.27]	1.00	1.10 [0.88–1.31]	1.03 [0.82–1.22]
Full Global (Global_3)*	1.14* [1.02–1.27] (1.05–1.26)	1.03	1.14 [0.98–1.31]	1.05 [0.88–1.22]

Numbers in square brackets correspond to 5%–95% statistical fit uncertainty ranges, accounting for autocorrelation in fit residuals. Round brackets denote observational parametric uncertainty where available (HadCRUT4, Cowtan‐Way). NOAA and NASA are each aligned to match 1880–1900 mean of the other three data sets. Best estimates from three full global series are denoted by *. Group mean estimates (in bold) are given with uncertainties encompassing the spread from lowest 5% to highest 95%. For the Global_3 group, the observational uncertainty is from Cowtan‐Way, expanded by the spread of the three central estimates.

ΔGMST_OLS is always lower than ΔGMST_LOESS, with some central OLS ΔGMST estimates lying below the LOESS uncertainty range or nearly so (Cowtan‐Way, Berkeley Earth). Data sets are similarly ranked for both OLS and LOESS_md over 1880–2019, from HadCRUT4 (0.96, 0.99) to Berkeley Earth (1.05, 1.14). The Global_3 interpolated series exhibit a greater relative difference than the non‐global series; the Berkeley Earth and HadCRUT4 LOESS_md difference is 0.21°C, but only 0.13°C for OLS. Thus OLS not only renders lower ΔGMST, but also de‐emphasizes the differences between the data sets.

For LOESS_bsln to 2019, there are minor differences in assessed values but no changes in data set rankings versus LOESS_md 1880–2019. LOESS_bsln is generally ∼0.1°C higher than 1850–1900 to 2010–2019 ΔGMST, reflecting the 5‐year offset and ∼0.2°C/decade recent warming (2010–2019 is centered at the end of 2014). At 1.14°C, Global_3 LOESS_bsln ΔGMST to 2019 is 0.03°C higher than OpAll average, reflecting a 0.09°C difference with the mean of the two reduced coverage series from HadCRUT4 and NOAA GlobalTemp. The 1880–2019 LOESS_md discrepancy is even wider: 0.09°C for NOAA and 0.15°C for HadCRUT4. LOESS_bsln statistical fit uncertainties are smaller than LOESS_md or OLS, reflecting the smaller uncertainty of departure from the 1850–1900 mean rather than a single point (as noted in Section 2.2.2).

The observation‐based and CMIP6 blended ensemble LOESS_bsln (Figure 2a) show broadly similar changes: a rise to 1950, a 1950–1975 flattening, and strong post‐1975 warming. The observations show stronger 1920–1950 warming, especially in the three HadSST‐based series, and weaker post‐1975 warming.

View Image - 2 Figure. GMST series and group surface warming estimates. (a) Monthly series and multidecadal LOESSbsln ΔGMST (span ± 20 years) are shown for HadCRUT4 (red), NOAA GlobalTemp (light blue), NASA GISTEMP (blue), Cowtan‐Way (purple) and Berkeley Earth (orange), together with OLS and period estimates from IPCC AR5 and SR15. NOAA GlobalTemp and NASA GISTEMP have been matched to the longer data sets over the overlapping 1880–1900 period. Also shown are 24 CMIP6 SAT‐SST model runs, blended following Cowtan et al. (2015) and Richardson et al. (2018). (b) LOESSbsln (solid line with filled circle) is shown for two GMST groupings: Global_3 (purple) and OpAll (dark red). Also shown are selected additional warming estimates: anthropogenic following Haustein et al. (2017) (diamonds), decadal average (crosses) and OLS linear trend from 1880 (x‐crosses). Recent IPCC ΔGMST estimates are highlighted by large squares: AR5 OLS to 2012 (light blue) and SR1.5 2006–2015 mean extended to 2017 (blue), together with corresponding Global_3 LOESSbsln ΔGMST (purple).

2 Figure. GMST series and group surface warming estimates. (a) Monthly series and multidecadal LOESSbsln ΔGMST (span ± 20 years) are shown for HadCRUT4 (red), NOAA GlobalTemp (light blue), NASA GISTEMP (blue), Cowtan‐Way (purple) and Berkeley Earth (orange), together with OLS and period estimates from IPCC AR5 and SR15. NOAA GlobalTemp and NASA GISTEMP have been matched to the longer data sets over the overlapping 1880–1900 period. Also shown are 24 CMIP6 SAT‐SST model runs, blended following Cowtan et al. (2015) and Richardson et al. (2018). (b) LOESSbsln (solid line with filled circle) is shown for two GMST groupings: Global_3 (purple) and OpAll (dark red). Also shown are selected additional warming estimates: anthropogenic following Haustein et al. (2017) (diamonds), decadal average (crosses) and OLS linear trend from 1880 (x‐crosses). Recent IPCC ΔGMST estimates are highlighted by large squares: AR5 OLS to 2012 (light blue) and SR1.5 2006–2015 mean extended to 2017 (blue), together with corresponding Global_3 LOESSbsln ΔGMST (purple).

Separate tests showed that derived ΔGMST_LOESS was similar when restricting CMIP6 spatial coverage to that of Berkeley Earth, so we take the CMIP6 blended ensemble as directly comparable to the Global_3 series (Figure S14). The Global_3 rise of 1.14°C is just above the median CMIP6 estimate extended linearly to 2019, 1.12°C [0.91–1.41]. However, the Global_3 current trend of 0.21°C/decade (as estimated by the LOESS_bsln slope at the 2019 end point) is lower than CMIP6's 0.26°C/decade [0.18–0.38] or the likely ECS subensemble's 0.25°C/decade [0.18–0.29].

In general, Figure 2a shows LOESS_bsln ΔGMST from the five updated observational data sets (colored lines) are at or above recent IPCC long‐term observational ΔGMST estimates (represented by crosses and x‐crosses). Figure 2b affords a closer view of recent ΔGMST estimates, including group LOESS_bsln calculated to 2012 and 2017 for direct comparison to IPCC AR5 and SR1.5. As previously stated, AR5's main estimate of 0.85°C was from linear OLS on the data sets available then. Since the mean 1880–2012 OLS trend for OpAll is 0.89°C and LOESS_bsln is 0.93°C, ΔGMST methodology accounts for half of the discrepancy between AR5's 1880–2012 estimate and our OpAll based estimate. The 2012 gap is even wider for the Global_3 group. OLS to 2012 is 0.90°C and LOESS_bsln is 0.96°C; that gap continues to grow, reaching 0.09°C in 2019.

The SR1.5 2006–2015 mean ΔGMST_period of 0.87°C, centered at the end of 2010, was extended to the most recent year (2017) to provide a then current estimate of 1.0°C (Section 1.2.1.3 in Allen et al., 2018). The same extension to 2017 applied to the updated series shows a 0.03°C gap with LOESS_bsln evaluated in 2017. This discrepancy may be related to internal variability suppressing early 2000s warming; the period difference estimate based on the most recent decade then available (2008–2017) shows no such discrepancy with LOESS_bsln. Both LOESS_bsln and period estimates are in good agreement with the slightly higher Haustein human‐induced warming ΔGMST_F,anthro estimates.

Figure 3 compares Global_3 LOESS_bsln and period ΔGMST in more detail. Since IPCC SR1.5 explicitly considered the 2006–2015 mean as a proxy for the 1996–2025 average (relative to 1850–1900), we consider the centered 20‐year average and a 30‐year “extended” average assuming the current linear 30‐year trend continues over the next 15 years. We estimate that the 1979–2019 warming has been approximately linear (see Table S2 showing OLS‐LOESS agreement over this period), and the large ensembles also imply minor errors from assuming linearity through 2025. Figure 3a shows that in general LOESS_bsln departs less from the eventual 20 and 30 year average than the decade mean and confirms that 2006–2015 was affected by an early 2000s slowdown. LOESS_bsln has more stability relative to anthropogenic warming estimates (Figure 3b) with near‐identical concordance with ΔGMST_F,anthro since 2003, and has lower RMSE relative to the longer period averages since the late 1990s (Figures 3c and 3d).

View Image - 3 Figure. ΔGMST estimation method validation based on average of three global series. (a) LOESSbsln to 2019 (blue) is shown with 5‐year lagged LOESS (light blue), decadal average (red), 20‐year average (light gray) and 30‐year average (black). LOESS (light blue) versus decadal (red) differences are shown with (b) forced warming estimates following Haustein et al. (2017) and (c) validation targets (30‐year average, 30‐year average extended with linear trend and 20‐year average). (d) RMSE is calculated from errors shown in (c).

3 Figure. ΔGMST estimation method validation based on average of three global series. (a) LOESSbsln to 2019 (blue) is shown with 5‐year lagged LOESS (light blue), decadal average (red), 20‐year average (light gray) and 30‐year average (black). LOESS (light blue) versus decadal (red) differences are shown with (b) forced warming estimates following Haustein et al. (2017) and (c) validation targets (30‐year average, 30‐year average extended with linear trend and 20‐year average). (d) RMSE is calculated from errors shown in (c).

The equivalent performance evaluation of long‐term Global_3 LOESS_bsln versus OLS ΔGMST in Figure S15 shows a growing cool bias in OLS relative to the 20‐ and 30‐year average from 1995 on (Figure S15a) and thus much higher RMSE than LOESS_bsln relative to the longer period averages (Figure S15d).

Global_3 LOESS_bsln ΔGMST to 2019 is our main input for subsequent analysis such as remaining carbon budget, for which combined 17%–83% uncertainty is required; the combined statistical and observational uncertainty calculated following the method outlined in Section 2.2.4 yields Global_3 ΔGMST of 1.14°C [1.05–1.25].

Large Ensemble Validation

Figures 4a and 4d shows the MPI‐GE and CSIRO Mk3.6.0 annual SAT range, individual LOESS_md fits and GSAT_F estimate, Figures 4b and 4e contains example LOESS and OLS fits to a single simulation and Figures 4c and 4f shows the forced, LOESS and OLS ΔGSAT estimates through 2019 for each start year from 1850 to 1980.

View Image - 4 Figure. (a) MPI‐GE SAT outputs, full ensemble range is shaded, each simulation's LOESS fit is in gray and the ensemble mean (our estimate of GSATF) is in red. (b) example of fits applied to a single simulation (black) including LOESS (dark blue) and OLS over three different periods (straight lines) with GSATF in red. OLS lines are shifted up so that their end points correspond to the relevant ΔGMST for ease of comparison. (c) calculated ΔGSAT for GSATF (red), based on the LOESS fit (dark blue) and based on OLS (cyan). For the fits, the lines are the ensemble median and the shaded regions the 5%–95% range. (d–f) as (a–c) but for the CSIRO Mk 3.6.0 ensemble.

4 Figure. (a) MPI‐GE SAT outputs, full ensemble range is shaded, each simulation's LOESS fit is in gray and the ensemble mean (our estimate of GSATF) is in red. (b) example of fits applied to a single simulation (black) including LOESS (dark blue) and OLS over three different periods (straight lines) with GSATF in red. OLS lines are shifted up so that their end points correspond to the relevant ΔGMST for ease of comparison. (c) calculated ΔGSAT for GSATF (red), based on the LOESS fit (dark blue) and based on OLS (cyan). For the fits, the lines are the ensemble median and the shaded regions the 5%–95% range. (d–f) as (a–c) but for the CSIRO Mk 3.6.0 ensemble.

The ΔGSAT_F and LOESS ΔGSAT agree well outside of periodic ΔGSAT_F spikes from volcanic eruptions, that is, when the forced change is smooth over our ±20 year window, such that ΔGSAT_LOESS ≈ ΔGSAT_F. For changes from the 19th century to recently, the IPCC AR5 estimates of solar forcing change are negligible compared with anthropogenic forcing so longer‐term ΔGSAT_F should approximate the ΔGSAT_F,anthro used in our later carbon budget calculation. Meanwhile, OLS is biased relative to ΔGSAT_F in the long‐term, and is more sensitive to internal variability in the short term; for example, for 1990–2019 OLS ensemble spread is 62% (MPI‐ESM) or 26% (CSIRO Mk3.6.0.) larger than LOESS ensemble spread.

Table 3 contains the large ensemble ΔGSAT estimates. For periods like 1850–1900 to 2010–2019, we use Section 2.2.2's LOESS_bsln approach while OLS is fit between the middle of each period. In both ensembles LOESS performs similarly to the period difference with the 5th, 50th, and 95th percentiles of the ensemble LOESS and period difference calculations all agreeing to within ≤0.02°C. LOESS slightly outperforms centered period differences evaluated from 1850–1900 to end periods ranging from 1986–1995 through 2010–2019 when validated against 30‐year average (see Figure S16). This validates LOESS performance, and Table 3 shows an advantage over period means since its calculation can be extended to the latest available year without greatly inflated uncertainty. The 0.06–0.10°C discrepancies in the third column of Table 3 for 1880–2019 LOESS‐GSAT_F are likely because the LOESS window centered at 1880 captures Krakatoa's large post 1883 cooling, thereby reducing the 1880 LOESS estimate and increasing its 1880–2019 ΔGMST. These results show that such biases are period‐dependent, are indeed negligible for 1850–1900 to 2019 in these models, and support our choice of time periods in the analysis using observational data sets.

3 TableLong‐Term ΔGSAT Estimated for Various Periods for the Ensemble Mean T_F, Plus the Ensemble Medians and 5%–95% Ranges for Rstimates Based on LOESS, OLS or Taking the Mean of the Raw SAT Outputs

	MPI‐ESM ΔGSAT[°C] median [5%–95%] [17%–83%]
Method	1850–1900 to 2010–2019	1850–1900 to 2019	1880 to 2019
T_F	1.15 [1.15–1.16] [1.15–1.16]	1.25 [1.23–1.28] [1.24–1.27]	1.20 [1.17–1.23] [1.18–1.22]
LOESS	1.16 [1.07–1.24] [1.11–1.21]	1.25 [1.15–1.36] [1.21–1.32]	1.26 [1.15–1.36] [1.20–1.31]
OLS	1.02 [0.93–1.12] [0.97–1.07]	1.13 [1.04–1.23] [1.08–1.18]	1.15 [1.06–1.23] [1.10–1.20]
Individual runs	1.15 [1.07–1.24] [1.11–1.20]	1.24 [1.04–1.48] [1.12–1.40]	1.20 [0.92–1.50] [1.04–1.39]
CSIRO Mk3.6.0 ΔGMST[°C]
T_F	0.92 [0.90–0.93] [0.91–0.92]	1.03 [0.99–1.07] [1.00–1.05]	0.93 [0.88–0.98] [0.90–0.96]
LOESS	0.93 [0.79–1.04] [0.82–1.01]	1.05 [0.89–1.18] [0.90–1.12]	1.03 [0.84–1.16] [0.91–1.10]
OLS	0.63 [0.46–0.72] [0.52–0.70]	0.73 [0.56–0.85] [0.61–0.82]	0.75 [0.58–0.87] [0.64–0.83]
Individual runs	0.91 [0.78–1.04] [0.83–1.00]	1.03 [0.81–1.22] [0.86–1.12]	0.94 [0.66–1.15] [0.76–1.05]

Uncertainties in TF differences are derived by treating TF as a sample mean and assuming the ensemble members follow a Gaussian distribution in any given year. The period errors are then combined in quadrature. GSAT, global near‐surface air temperature; OLS, Ordinary least squares.

As our carbon budget calculations include an internal variability error component, we consider ensemble spread and statistical fit uncertainties as candidates and compare the LOESS_bsln ensemble 83rd minus 17th percentile and the statistical 17%–83% ranges for each run over 1850–1900 to 2019. The CSIRO Mk3.6.0 17%–83% ensemble spread in GSAT LOESS_bsln is 0.22°C. This is larger than the median ensemble member's statistical range (0.18°C) and similar to the largest individual ensemble member range (0.22°C). For MPI‐ESM the ensemble spread (0.11°C) is smaller than the median statistical uncertainty (0.16°C) and is marginally lower than the smallest member value (0.12°C). For the internal variability component of ΔGSAT uncertainty in our carbon budgets we present results both using statistical uncertainty (derived only from observational data) and a more conservative estimate using the ±0.11°C CSIRO Mk3.6.0 ensemble spread.

This large ensemble analysis has:

(i)
provided limited support for our LOESS‐based statistical uncertainty estimates being similar to model variability
(ii)
shown that LOESS matches or exceeds period difference performance while having lower long‐term bias and short‐term uncertainty than OLS
(iii)
verified that LOESS reliably reproduces ΔGSAT_F outside of years immediately following large volcanic eruptions, particularly supporting our LOESS_bsln results as an estimate of ΔGSAT_F,anthro

Global SAT Estimate and Remaining Carbon Budget

We now convert our best estimate ΔGMST_LOESS of 1.14°C [1.05–1.25] (17%–83% uncertainty) to an equivalent ΔGSAT_LOESS as outlined in Section 2.2.4. Our CMIP6 ensemble LOESS_bsln A_blend ratio ΔGSAT_LOESS/ΔGMST_LOESS reflects an increase of ΔGSAT_LOESS over full‐coverage ΔGMST_LOESS of 5.8% [4.4, 7.2 ] in 2014, that is, long‐term near‐surface air temperature warming is 5.8% greater than our blended estimate. This A_blend estimate is very similar to equivalent CMIP5‐based estimates, but much lower than the 24% derived in CMIP5 for 1861–1880 to 2006–2015 using a HadCRUT4‐like masking and blending algorithm (Richardson et al., 2018). This is due to the different handling of sea ice and the incorporation of complete (unadjusted) spatial coverage in the A_blend calculation.

Combining this ratio and its uncertainty with our Global_3 ΔGMST_LOESS, as outlined in Section 2.2.4, we obtain ΔGSAT_LOESS of 1.21°C [1.11–1.32] from 1850 to 1900 to 2019, a lower uncertainty than the equivalent SR1.5 estimate of ±0.12°C (Section 1.2.1.2 in Allen et al., 2018). The conservative CSIRO‐based internal variability yields a wider ΔGSAT_LOESS range of 1.07°C–1.37°C. These estimates all represent uncertainty in total forced warming; however, uncertainty in anthropogenic warming was estimated to be still higher at ±0.2°C (Section 1.2.1.3 in Allen et al., 2018). The equivalent LOESS_bsln HadCRUT4 estimate using the SR1.5 correction of ∼15% yields slightly lower ΔGSAT_obs of 1.17°C, and the updated SR1.5 2006–2015 estimate extended to end of 2019 is 1.15°C. Finally, A_blend corrected LOESS_bsln HadCRUT4 yields 1.08°C; the difference of 0.13°C with our main ΔGSAT_LOESS primarily reflects HadCRUT4 coverage bias, as well as a small sea ice edge effect. The other carbon budget calculation components also have large uncertainties. Cumulative emissions to end of 2019 are 2,320 ± 230 GtCO₂ (Friedlengstein et al., 2019), while nonCO₂ uncertainties are even higher (see Table 2.2 in Rogelj et al., 2018). Although no formal methods exist to combine these uncertainties, Rogelj et al. (2018) estimated overall uncertainty of ±50% in SR1.5 remaining carbon budgets.

Figure 5 shows the calculation for the remaining carbon budget with a 66% chance to stay below 1.5°C, along with the historical cumulative CO₂ emissions and temperature change.

View Image - 5 Figure. Global temperature change from 1850 to 1900 versus cumulative CO2 emissions. The smoothed temperature response from the Global3 blended GMST group as decadal average (blue) and LOESSbsln trend (purple) are shown relative to cumulative CO2 emissions from Friedlingsten et al. (2019). The thick black line shows the Global3 GSAT LOESSbsln trend, obtained by adjusting GMST by the ratio of GSAT and blended GMST historical runs from an ensemble of 24 CMIP5 models. The pink shaded plume and dark red line are estimated temperature response to cumulative CO2 emissions (TCRE) from the beginning of 2020 on. Also shown are other remaining carbon budget factors, TnonCO2 and EEsfb (gray arrows). The thick black double arrow represents the remaining carbon budget for 66% chance of remaining below 1.5°C. Vertical error bars show ΔGSAT combined observational and statistical uncertainty (dark blue), combined observational and internal variability derived from CSIRO ensemble (medium blue) and estimated uncertainty in anthropogenic warming (light blue). CSIRO, Commonwealth Scientific and Industrial Research Organisation; GMST, global mean surface temperature; GSAT, global near‐surface air temperature.

5 Figure. Global temperature change from 1850 to 1900 versus cumulative CO2 emissions. The smoothed temperature response from the Global3 blended GMST group as decadal average (blue) and LOESSbsln trend (purple) are shown relative to cumulative CO2 emissions from Friedlingsten et al. (2019). The thick black line shows the Global3 GSAT LOESSbsln trend, obtained by adjusting GMST by the ratio of GSAT and blended GMST historical runs from an ensemble of 24 CMIP5 models. The pink shaded plume and dark red line are estimated temperature response to cumulative CO2 emissions (TCRE) from the beginning of 2020 on. Also shown are other remaining carbon budget factors, TnonCO2 and EEsfb (gray arrows). The thick black double arrow represents the remaining carbon budget for 66% chance of remaining below 1.5°C. Vertical error bars show ΔGSAT combined observational and statistical uncertainty (dark blue), combined observational and internal variability derived from CSIRO ensemble (medium blue) and estimated uncertainty in anthropogenic warming (light blue). CSIRO, Commonwealth Scientific and Industrial Research Organisation; GMST, global mean surface temperature; GSAT, global near‐surface air temperature.

Our remaining carbon budgets incorporate the SR1.5 Table 2.2 100 GtCO₂ adjustment for earth‐system feedbacks (CO₂ and CH₄ release from warming wetland and permafrost thaw), following recent practice established in Rogelj et al. (2019) and Nauels et al. (2019). Carbon budgets excluding this term are therefore 100 GtCO₂ higher, as in the SR1.5 “headline” remaining carbon budget of 420 GtCO₂ (IPCC, 2018) to remain under 1.5°C (with 66% chance).

The remaining carbon budgets from the start of 2020 for a 66% (50%) chance to stay below 1.5°C and 2.0°C are 220 (350) GtCO₂ and 880 (1,270) GtCO₂ respectively (rounded to nearest 5 GtCO₂). Given current annual emissions of just over 40 GtCO_2, the 66% 1.5°C remaining carbon budget is only ∼15 GtCO₂ lower than the equivalent carbon budgets including earth‐system feedbacks in SR1.5 Table 2.2 (320 GtCO₂ from 2018) and Nauels et al (235 GtCO₂ from 2020). However, our 50% 1.5°C carbon budget is ∼45 GtCO₂ below those two studies. This follows from the slightly higher ΔGSAT_obs found in this study, combined with an identical TCRE spread starting in 2020 rather than the SR1.5 reference period centered at the start of 2011. In effect, the upto‐date LOESS_bsln estimate of ΔGSAT_obs reduces the contribution of TCRE uncertainty, as there is less ΔT “to go.”

SR1.5's secondary carbon budgets used the average ΔGMST through 2006–2015 to obtain a 66% chance of staying below 1.5°C resulting in an equivalent budget of 470 GtCO₂ from 2018 (i.e., 385 GtCO₂ from 2020). Our alternative budget using Global_3 ΔGMST_LOESS instead of ΔGSAT_LOESS is 305 GtCO₂ from 2020. This large difference relative to SR1.5 is unsurprising as the Global_3 series show more historical warming whereas the SR1.5 ΔGMST_period average included HadCRUT4 and its more substantial coverage bias. We also note that an OLS 1880–2019 ΔGMST basis would imply even higher 1.5°C 66% remaining carbon budgets of 455 GtCO₂ (Global_3) or 485 (GtCO₂ (OpAll).

Discussion and Conclusions

We have explored the range of warming estimates since the late 19th century across different observational series using multiple estimation methodologies, focusing on the Global_3 subset of extensively interpolated data sets (NASA GISTEMP, Cowtan‐Way and Berkeley Earth). Our main LOESS_bsln Global_3 ΔGMST since 1850–1900 is, to our knowledge, the first such estimator that (i) integrates robust statistical uncertainties, with fit residuals following the assumed noise process, (ii) has been extended to provide a corresponding ΔGSAT_LOESS since 1850–1900, including combined observational and internal variability uncertainties, and (iii) has been validated against output from model large ensembles.

IPCC SR1.5 reported ΔGMST_period of 0.87°C to 2006–2015 using four data sets (1.0°C when extended to 2017) and estimated ΔGSAT_period of 0.97°C by adjusting one data set (HadCRUT4) for biases related to incomplete coverage and sea‐air temperature differences, effectively discarding the other three. The ensuing carbon budget calculation included cumulative emissions up to 2017, necessitating an implicit extension of ΔGSAT_period to that date. The simplicity and coherence of our “upto‐date” ΔGMST_LOESS and ΔGSAT_LOESS estimates represent a clear advance over the IPCC ΔGMST period difference and ΔGSAT derivation methods. Not only is LOESS_bsln generally an unbiased ΔGMST_F estimator outside periods of volcanism, but the method includes a more consistent and intuitive baseline alignment of data sets beginning in 1880 and maintains the previously stated advantage of including statistical uncertainty derived using a noise model consistent with the data. Moreover, validation tests with observations and the large ensembles confirm LOESS_bsln results in lower biases relative to ΔGSAT_F and lower susceptibility to natural variation. None of this is surprising considering that the IPCC period difference method is essentially a 10‐year moving average.

Another key difference with IPCC SR1.5 is our consistent use of the Global_3 data sets with extensive spatial interpolation. As previously noted in Section 2.1.1, these data sets are demonstrably more representative of global climate change and require smaller and less uncertain adjustments (∼6%) to obtain ΔGSAT_LOESS from ΔGMST_LOESS, in contrast to the 15% adjustment applied to HadCRUT4 ΔGMST_period in IPCC SR1.5. The Global_3 data sets give 0.12°C more warming than HadCRUT4 from 1850 to 1900 and the divergence related to unmitigated bias coverage may well grow, as the Global_3 LOESS_md trend is now 0.04°C/decade higher than HadCRUT4's 0.17°C/decade. Focusing on the three Global_3 data sets and our robust LOESS_bsln estimator dramatically reduces the spread between ΔGMST estimates: the inter‐dataset spread in Global_3 LOESS_bsln 1850–1900 to 2019 ΔGMST is only 0.07°C. Including the non‐global data sets increases the LOESS_bsln spread to 0.17°C, and including OLS and LOESS_md trend methodologies increases the spread to 0.27°C: from 0.93°C (OLS for HadCRUT4) to 1.20°C (Berkeley Earth LOESS_md).

SR1.5 also reported 1880–2012 and 1880–2015 linear trend ΔGMST, but mainly to provide “traceability” to the IPCC AR5. In contrast, AR5's main estimate of 0.85°C was based on the mean linear trend of available data sets, while HadCRUT4 2003–2012 period difference from 1850 to 1900 ΔGMST estimate was a primary input for further analyses such as future projections (Collins et al., 2013) and attribution (Bindoff et al., 2013).

If IPCC AR6 follows AR5 and provides both period difference and point‐to‐point trends for data sets beginning in 1850, that would imply the three post‐1850 data sets would form the basis for 2010–2019 period ΔGMST_obs relative to 1850–1900. As noted above though, LOESS_bsln to 2019 offers a superior alternative. Since HadCRUT4 clearly does not meet our “quasi‐global” criterion, we omit it as a direct component of ΔGMST_LOESS. Nevertheless, HadCRUT4 and its underlying land and ocean data sets (CRUTEM4 and HadSST3) form the essential basis of Cowtan‐Way, and HadSST3 is also a key component of Berkeley Earth. Following the precedent set in IPCC SR1.5, the ERSSTv5 based data sets starting 1880 should also be considered, using baseline matching over 1880–1900. Our Global_3 group member, NASA GISTEMP is an obvious choice for inclusion, while NOAA GlobalTemp could be omitted according to our global coverage criterion. However, that case is less clear cut than HadCRUT4 due to NOAA's complicated spatial coverage. Once again, though, NOAA's GHCNv4 and ERSSTv5 data sets would still be present as they form the essential basis of NASA GISTEMP.

The recent release of HadCRUT5 (Morice et al., 2020) will certainly inform future regular updates of our main ΔGMST_LOESS and ΔGSAT_LOESS estimates. HadCRUT5 features sophisticated kriging interpolation, resulting in virtual coverage similar to Berkeley Earth, and incorporates updated data sets for land (CRUTEM5; Osborn et al., 2020) and ocean (HadSST4; Kennedy et al., 2019). We give a preliminary evaluation of the eventual effect of HadCRUT5 (and HadSST4) in Table S4. The incorporation of HadSST4 (instead of HadSST3) into Cowtan‐Way and Berkeley Earth results in a noticeable increase in ΔGMST_LOESS, while results for HadCRUT5 are nearly identical to Cowtan‐Way/HadSST4.

Since observational data sets beginning in 1880, such as NASA GISTEMP, potentially could be included alongside the three data sets starting in 1850, LOESS_bsln ΔGMST arguably renders 1880–2019 ΔGMST_OLS redundant in IPCC AR6. However, AR5 also compared long‐term ΔGMST_OLS trends starting from 1880 to short‐term trends starting from mid‐century or later. Our results reinforce that 1880–2019 linear trend is inconsistent with LOESS_md 1880–2019 ΔGMST. The bias of long‐term OLS ΔGMST was confirmed in analysis of two large ensembles, which also showed that it has 26%–62% larger uncertainty than LOESS_md for recent 30‐year trends. As seen in Table S2, observed OLS trends from 1951 have wider uncertainty than the corresponding LOESS_md estimates and show evidence of warm bias as well (for example the NASA GISTEMP 1951–2019 OLS is almost identical to 1880–2019). We therefore recommend LOESS_md over linear trend for both long‐term (>120 years) and short‐term (30–70 years) intervals.

LOESS_bsln statistical uncertainties represent another opportunity for AR6. If ΔGMST_LOESS is close enough to ΔGMST_F then with an appropriate noise model the ΔGMST_LOESS uncertainty due to internal variability could be derived from the LOESS residuals. We combined this with observational uncertainty and carried it forward directly to ΔGSAT_LOESS for carbon budget calculations, but it could also be used for other follow‐on analyses. The median statistical uncertainties from the large ensemble runs are within 25% of the ensemble spreads, and the residual autocorrelation structure implies potential for this approach.

However, global climate models may not capture long‐term internal variability (Brown et al., 2015). For example, recent Pacific changes may indicate stronger real‐world multidecadal variability (e.g., England et al., 2014), although consensus is lacking (Seager et al., 2019). We take no position on the ability of models to generate this variability, only note that it has been studied in CMIP5 (e.g., Brown et al., 2015) and CMIP6 (e.g., Parsons et al., 2020) and report on how errors would affect our conclusions. Substantial internal variability on ±20 year timescales or longer would result in underestimated LOESS uncertainties. By contrast, large forced changes on shorter timescales, such as due to volcanism, would increase the uncertainties. Nevertheless, our method derives uncertainties directly from observations and so may have advantages over approaches that rely on model outputs or estimated forcings (Haustein et al., 2017; Otto et al, 2015).

Given the above caveats we provided a more conservative ΔGSAT uncertainty incorporating the CSIRO model large ensemble spread and its pronounced internal variability. Since our ΔGMST_LOESS and ΔGSAT_LOESS estimates are close to observation‐based anthropogenic warming, confirming a basic finding of IPCC SR1.5, we treat our ΔGSAT_LOESS as an estimate of ΔGSAT_F,anthro, albeit with appropriately wider uncertainties. In general, our approach yields straightforward and up‐to‐date estimates of ΔGMST and ΔGSAT to inform remaining carbon budget calculations that incorporate appropriate ΔGSAT uncertainties.

To summarize, we argue strongly in favor of LOESS_bsln ΔGMST using series with near‐global coverage. Combining our statistical estimate of internal variability with data set spread and data set parametric uncertainty results in a best estimate of warming from 1850 to 1900 to 2019 of 1.14°C [1.05–1.25] (17%–83% uncertainty). Not only is this updated through 2019, rather than the prior‐decade value of the IPCC's period mean difference, but it includes a potentially useful statistical fit uncertainty that is not readily or typically derived for period mean differences.

Our CMIP6‐derived GSAT adjustment yields corresponding ΔGSAT_LOESS of 1.21°C [1.11–1.32] (17%–83% uncertainty), implying a remaining carbon budget of ∼220 GtCO₂ for a 66% chance that ΔGSAT since 1850–1900 remains below 1.5°C. This carbon budget is ∼5.5 years of current emissions and is less than half the 455–485 GtCO₂ carbon budget implied by an OLS ΔGMST basis. Our ΔGSAT estimate uncertainty can be adapted to a desired interpretation of ΔGSAT, for example, as total or anthropogenic warming. All ΔGSAT_LOESS and ΔGMST_LOESS indices can be updated annually and are only dependent on the temperature data sets, yielding a set of transparent and easily communicated metrics to measure progress toward climate goals.

Acknowledgments

The authors thank Andrew Dessler for provision of MPI‐GE series. DCC thanks Shaun Lovejoy and Lenin Del Rio Amador for clarifying discussions. MR's contribution was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration (80NM0018D004).

Data Availability Statement

Berkeley Earth data are available from http://berkeleyearth.org/data/. Cowtan‐Way data, including merged HadSST4 series, are available from http://www-users.york.ac.uk/∼kdc3/papers/coverage2013/series.html. HadCRUT4.6 data are available from https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/download.html. HadCRUT5 data are available from https://www.metoffice.gov.uk/hadobs/hadcrut5/data/current/download.html. NASA GISTEMP data are available from https://data.giss.nasa.gov/gistemp/. NOAA GlobalTemp data are available from https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v5/access/timeseries/. CMIP6 data are available from https://esgf-node.llnl.gov/search/cmip6/.

Word count: 9720

Show less

© 2021. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Change in global mean surface temperature (ΔGMST), based on a blend of land air and ocean water temperatures, is a widely cited climate change indicator that informs the Paris Agreement goal to limit global warming since preindustrial to “well below” 2°C. Assessment of current ΔGMST enables determination of remaining target‐consistent warming and therefore a relevant remaining carbon budget. In recent IPCC reports, ΔGMST was estimated via linear regression or differences between decade‐plus period means. We propose nonlinear continuous local regression (LOESS) using ±20 year windows to derive ΔGMST across all periods of interest. Using the three observational GMST data sets with almost complete interpolated spatial coverage since the 1950s, we evaluate 1850–1900 to 2019 ΔGMST as 1.14°C with a likely (17%–83%) range of 1.05°C–1.25°C, based on combined statistical and observational uncertainty, compared with linear regression of 1.05°C over 1880–2019. Performance tests in observational data sets and two model large ensembles demonstrate that LOESS, like period mean differences, is unbiased. However, LOESS also provides a statistical uncertainty estimate and gives warming through 2019, rather than the 1850–1900 to 2010–2019 period mean difference centered at the end of 2014. We derive historical global near‐surface air temperature change (ΔGSAT), using a subset of CMIP6 climate models to estimate the adjustment required to account for the difference between ocean water and ocean air temperatures. We find ΔGSAT of 1.21°C (1.11°C–1.32°C) and calculate remaining carbon budgets. We argue that continuous nonlinear trend estimation offers substantial advantages for assessment of long‐term observational ΔGMST.

Details

Title

The Benefits of Continuous Local Regression for Quantifying Global Warming

Author

Clarke, David C¹

; Richardson, Mark²

¹ Independent Researcher, Montreal, QC, Canada
² Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA; Department of Atmospheric Science, Colorado State University, Fort Collins, CO, USA

Section

Research Article

Publication year

2021

Publication date

May 2021

Publisher

John Wiley & Sons, Inc.

e-ISSN

2333-5084

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1029/2020EA001082

ProQuest document ID

2532660497

The Benefits of Continuous Local Regression for Quantifying Global Warming

Jump to:

Full Text

Abstract

Details

Suggested sources