1 Introduction
With the successful implementation of the Montreal Protocol, various observations confirm reductions in the concentrations of halogenated ozone-depleting substances (ODSs) in the atmosphere . Satellite data records also confirm a peak in upper-stratospheric HCl (the main chlorine reservoir) around 1997, followed by a steady decline . Hence, attention has turned towards the detection and attribution of ozone recovery (e.g. ). However, the accurate quantification of ozone changes is challenging because of the quality of long-term ozone profile data sets, where measurement errors are of similar or larger magnitude than the long-term ozone trends. In addition, complex coupling between various physical and chemical processes controlling stratospheric ozone concentrations cause large short-term ozone changes. Complications also arise because there are some non-linear changes in stratospheric dynamics as well as chemical constituents. For example, between 2018 and 2021, some of the largest and smallest ozone losses of the recent decades were recorded in both the Arctic and Antarctic polar stratospheres (e.g. ). Some observational data suggest that there has been a continuous decline in lower-stratospheric ozone , which could be attributed to changes in stratospheric dynamics (e.g. ). Atmospheric concentrations of ODSs such as CFC-11 are decreasing at uneven rates , which could induce variability in ozone trends. Additionally, significant positive trends have been detected in very short-lived substances (VSLSs) containing chlorine and bromine that are not controlled by the Montreal Protocol
As there are no long-term ozone profile data from a single satellite instrument, various attempts have been made to merge such data from different instruments. However, individual satellite instruments have different temporal and spatial resolution depending on the measurement techniques and retrieval algorithms
Hence, adopted a novel approach to create a gap-free stratospheric ozone profile data for the 1979–2005 time period. They used SAGE (I and II) satellite profile measurements and polar ozonesondes, together with a seasonally varying ozone climatology from to fill the gaps, to generate multi-variate regression-based gap-free ozone anomalies. Later, used a similar methodology along with climate model simulations to extend the time series backwards to 1850. The data were recommended for the historical CMIP5 simulations for the climate models that did not include stratospheric chemistry, in order to enforce time-dependent ozone variations. used a different methodology to create a satellite-based long-term ozone profile data set. Along with SAGE I and II measurements, they used HALOE and POAM (Polar Ozone and Aerosol Measurement) II and III satellite measurements, as well as ozonesonde data from 130 stations, to create a collection of binary data files – also known as the Binary DataBase of Profiles (BDBP) version 1.0. updated the BDBP data set to construct “Bodeker Scientific” or “BS” data. They updated BDBP data by including measurements from the Limb Infrared Monitor of the Stratosphere (LIMS), the Improved Limb Atmospheric Spectrometer (ILAS) and ILAS II. They used a multivariate regression model to create different versions of the ozone profile data set ranging from the surface to 70 km for the 1979–2008 time period. revised and extended (1979–2016) the BS data set by using the TOMCAT chemical transport model (CTM) ozone profiles as a transfer function to capture ozone variability for the period without satellite observations.
Another widely used merged data set is the Global OZone Chemistry And Related trace gas Data records for the Stratosphere (GOZCARDS, ). These are monthly mean zonally averaged time series constructed using ozone profile measurements from several NASA satellite instruments and the Atmospheric Chemistry Experiment Fourier Transform Spectrometer (ACE-FTS, ). Merging is done primarily by removing average biases between SAGE II and individual data records for overlap periods . The GOZCARDS data files contain mixing ratios on a pressure–latitude grid (316 to 0.1 hPa), updated later to GOZCARDS v2.2 .
adopted a slightly different approach to construct the Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) data set. SWOOSH merges stratospheric ozone profile data from solar occultation instruments (SAGE-II/III, HALOE, ACE-FTS) as well as limb-scanning instruments (UARS-MLS and Aura-MLS). The measurements are homogenised by applying corrections that are calculated from data taken during time periods of instrument overlap. The primary SWOOSH data product consists of monthly mean zonal-mean values on a pressure grid at 2.5, 5 and 10 resolution. One of the major characteristics of SWOOSH data is that when merging greater weight is given to the instruments that sample more frequently (e.g. Aura-MLS). Filled and unfilled versions of the data set exist on both geographical and equivalent latitude coordinates.
Several additional attempts have been made to merge satellite time series from limb and occultation instruments. For example, the SAGE-CCI-OMPS data set, described in , includes SAGE II time series and several limb data sets. The OMPS-LP data set used is produced at the University of Saskatchewan, Saskatoon . First, they screened and homogenised CCI data sets in the HARMOZ format before merging them in terms of ozone anomalies. Recently, created a merged SCIAMACHY-OMPS limb data set (SCIA-OMPS), which combines these two time series produced at the University of Bremen. They used MLS data series as a transfer function to merge SCIAMACHY with OMPS-LP as these instruments share only 2 months of overlap, but MLS was not included in the merged data set. This time series is monthly averaged, covers the period 2002–present and is longitudinally resolved, with a 5 latitude 20 longitude grid. Due to the similarities in the measurement geometries and techniques, as well as in the retrieval approaches, they implemented a plain de-biasing approach for the merging, directly obtaining a long-term ozone time series in appropriate units.
Another widely accepted approach is using data assimilation techniques to create observation-based data (e.g. ). However, only a few instruments such as MLS provide relatively long-term ozone profile measurements. For the pre-MLS time period, very few observations are available that can provide good constraint on the assimilation system. Also, the forward model is generally forced with available (re)analysis dynamical fields so reanalysis data sets are also prone to the inhomogeneities in the forcing fields along with any discrepancies in chemical scheme.
In this paper we present a new data–model method for producing a long-term data set of stratospheric ozone. We use ozone profile output from a CTM to create a machine-learning-based satellite-corrected long-term chemically (and dynamically) consistent ozone profile data set (hereafter, ML-TOMCAT) for the 1979–2020 time period. The CTM setup is described in Sect. 2, followed by our methodology in Sect. 3. Comparisons of ML-TOMCAT with some of the other available merged ozone profile data sets are presented in Sect. 4, with a summary of our key results in Sect. 5.
2 Model setup
We use chemically consistent monthly mean zonal mean ozone profiles from the TOMCAT CTM as the basis data set. TOMCAT is an offline three-dimensional (3D) CTM that includes a comprehensive stratospheric chemistry scheme . For the present study, the CTM setup is similar to the control simulations used in our recent studies such as , and . Briefly, TOMCAT is forced with meteorological fields from ERA-5 reanalyses , starting from 1979. Simulations are performed at a 2.8 2.8 horizontal resolution with 32 hybrid sigma-pressure levels extending from the surface to about 60 km. For major ODSs and greenhouse gases (GHGs) the model uses time-dependent observed global mean surface mixing ratios that are treated as well mixed throughout the troposphere. The model also includes the effects of solar flux variability and heterogeneous chemistry on volcanically enhanced stratospheric aerosol as described in . Solar irradiance data are from the NRL2 empirical model and the sulfate aerosol surface area density (SAD) from . TOMCAT also includes chlorine and bromine contributions from VSLSs as described in . A passive ozone tracer (no chemical ozone loss), generally used to diagnose chemical ozone loss, is initialised every 6 months from the chemical ozone tracer (1 June and 1 December). TOMCAT has been regularly used to study long-term changes in stratospheric trace gases, showing good agreement with various ground-based and satellite data sets (e.g. ).
3 Methodology
We use the random-forest (RF) regression analysis to generate a long-term chemically consistent data set. The RF is a supervised machine-learning (ML) algorithm that uses an ensemble of decision trees
A decision tree algorithm divides the data set into branches (using true and false criteria), which further segregate into other branches until a leaf node (or result node) is reached. Multiple trees are constructed by randomly sampling data points multiple times (e.g. bootstrap method). Hence, an individual tree can be considered a unique tree (hence unique output). RF uses a bagging technique, which means the RF model consists of many individual decision trees and aggregated predictions are used for the final prognosis. A distinct advantage of RF regression is that it is relatively accurate and very easy to set up. RF can also behave like a non-linear regression method. As RF adds randomness to the decision procedure, instead of relying on the most important explanatory variables, it searches for the best variable among random subsets. This ensures that the final output does not rely heavily on a single explanatory variable, thereby avoiding overfitting
Initially, TOMCAT zonal mean ozone profiles are linearly interpolated in log-pressure space on to 43 equidistant (12 per decade) pressure levels (1000–0.1 hPa, MLS pressure levels), followed by spatial interpolation onto 72 SWOOSH latitude bins at resolution. SWOOSH data are obtained via
For the regression analysis, a 20-year (largely MLS covering) time period is selected in order to avoid heteroscedasticity (i.e. effect of different sampling frequencies/methodologies
Overall, there are five explanatory variables in the regression model for individual grid points, and these are taken from TOMCAT output fields. The regression model can be represented as
1 where , , , and can be considered to be the contribution coefficient for a given explanatory variable, and PO, HCl and CH are TOMCAT monthly mean zonal mean tracers. For the calculation of dTCO we use Copernicus Climate Change Service (C3S) total ozone data (1979–2018). The C3S total column product is a combination of total column data from 15 sensors using gap-filling assimilation methods and is obtained via
Atmospheric chemical models are ideal tools for understanding/simulating past (and future) ozone changes, as they combine up-to-date knowledge about various physical and chemical processes using a mathematically consistent framework. Different models use different combinations of chemical and dynamical schemes to represent important processes in the atmosphere. However, some of these processes are computationally expensive; hence, they are represented by somewhat simplified parameterisations. For example, many chemical models prescribe observation-based sulfate surface area density (SAD) to represent the effects of volcanically enhanced stratospheric aerosol for simulating heterogeneous chemistry which leads to ozone loss . Many models also prescribe surface concentrations of GHGs and ODSs rather than emission fluxes. CTMs such as TOMCAT use dynamical forcing fields from (re)analyses data sets such as ERA-Interim or ERA-5. Hence, CTMs are subject to possible inhomogeneities due to changes in the number of assimilated observations, as well as other deficiencies (e.g. missing processes) in the forward model used in the assimilation system. On the other hand, observational data sets are also subject to errors associated with the measurement techniques, instrument degradation and retrieval algorithms. Hence, almost all chemical models may be expected to show a bias against observational data records, either because of model deficiencies or errors in the observations. However, chemical models do use a consistent chemical scheme, so we can assume that chemical model–observation ozone differences are largely due to uncertainties in the forcing fields such as meteorology (e.g. winds, temperature) and chemical parameterisations (e.g. reaction rates, solar fluxes, photolysis schemes). CTMs have the distinct advantage in terms of dynamics as they are forced with up-to-date reanalysis data, although with the above-noted caveat of possible inhomogeneities in observations used in the assimilation systems. Hence, the recently initiated SPARC Reanalysis Intercomparison Project (S-RIP) is aimed at providing guidance on future reanalysis activity. S-RIP also plans to perform comprehensive evaluation and intercomparison of different reanalysis data products; for details see
4.1 Model biases
Figure shows climatological (2006–2020) monthly zonal mean differences between TOMCAT and SWOOSH ozone profiles (TOMCAT minus SWOOSH). TOMCAT profiles show an almost symmetrically structured negative biases in the upper stratosphere and positive biases in the lower stratosphere. The largest negative biases (up to 0.8 ppm) occur in the tropical upper stratosphere (around 3 hPa), and they remain negative throughout the year. The ozone lifetime at these altitudes is less than a day; hence, the observed biases might be associated with deficiencies in the photochemical reactions in the model. At this altitude, ozone production is largely controlled by solar fluxes below 240 nm, while longer wavelengths control ozone destruction
Figure 1
Latitude–pressure cross sections of the climatological (2006–2020) monthly mean difference (ppm) between TOMCAT and SWOOSH ozone profiles.
[Figure omitted. See PDF]
In the lower stratosphere the ozone lifetime ranges from months to years; so, positive biases in the TOMCAT ozone could be due to a combination of both dynamics and chemistry. First, reduced overhead ozone could increase lower-stratospheric ozone via the self-healing effect; i.e. increased ultraviolet radiation increases ozone production at lower altitudes
As the exact causes of TOMCAT ozone biases are still not well understood, we use the RF model to remove them. The RF regression model coefficients are derived using 20 years (1991–1998, 2006–2018) for which SWOOSH data include a large number of observational profiles, especially from MLS on the UARS and Aura satellite platforms. The RF regression model uses 20 years of monthly data, with 80 % and 20 % of data points being used for training and testing, respectively. The estimated RF regression coefficients are then used to calculate model biases for the entire 42-year time period (1979–2020). RF-calculated ozone biases are then added to the TOMCAT time series to create the long-term gap-free data set, hereafter labelled ML-TOMCAT.
Figure shows how much variance (or ) of the data the RF regression model is able to explain, along with regression coefficients for individual explanatory variables. For example, an value of 0.8 indicates that the RF regression model is able to explain 80 % of the biases in TOMCAT ozone relative to SWOOSH data for the 20 years of the training period. also represents the sum of the regression coefficients from individual explanatory variables. Overall, the RF regression model performance is consistently high () throughout the stratosphere, except for the mid-stratosphere, which is a transition region where the TOMCAT ozone biases change from positive to negative. At high northern latitudes, mid-stratospheric values decrease to 0.6. However, since TOMCAT–SWOOSH differences are much smaller here, a RF-based correction has a minimal impact on the quality of ML-TOMCAT ozone profiles.
Figure 2
Latitude–pressure cross sections of the variance () and regression coefficients from passive ozone, HCl, CH, solar and total column ozone anomaly (see main text Eq. 1).
[Figure omitted. See PDF]
Additionally, as expected, the RF regression coefficients are significant in different regions of the stratosphere for various explanatory variables. The passive ozone tracer seems to show the largest coefficients in the tropical mid-stratosphere, as well as varying contributions in different regions of the stratosphere. The passive ozone contribution in the tropical mid-stratospheric could be linked to the incomplete representation of NO-related chemical changes in TOMCAT and/or seasonal changes in the stratospheric transport in the reanalysis
After analysing the regression coefficients, we now present a comparison between ML-TOMCAT and available satellite-based long-term data sets. Due to key differences between satellite measurement techniques, ozone profiles are retrieved either at altitude or pressure levels and either in units of mixing ratio or number density. For example, MLS retrieves profiles of ozone mixing ratio on pressure levels, whereas SAGE retrieves profiles of number density on altitude levels. Hence, merging these different data sets needs pressure, temperature or altitude information at a given co-location from an external source such as reanalysis data. The GOZCARDS and SWOOSH data sets use MERRA2 reanalysis data to convert SAGE II ozone number density profiles on fixed pressure levels . ML-TOMCAT is based on modelled ozone profiles as a function of pressure, although conversion to altitude (geopotential height) coordinates is straightforward. In particular, ML-TOMCAT data were processed on corresponding grids/units using ERA-5 geopotential height, temperature and pressure fields that are used to drive TOMCAT.
This subsection consists of two parts. First we compare ML-TOMCAT profiles with data sets using pressure co-ordinate systems (e.g. SWOOSH, GOZCARDS), followed by comparisons with altitude-based data sets (SAGE-CCI-OMPS, SCIA-OMPS, BSVert).
4.3.1 Comparison with pressure level data
As noted earlier, we used only 20 years of SWOOSH data to train the RF model. Hence, the next obvious step is to compare ML-TOMCAT ozone with SWOOSH over the full time period. Figure compares relative differences (in percent) of ML-TOMCAT with GOZCARDS and SWOOSH, respectively, as a function of latitude and pressure. ML-TOMCAT shows slightly positive biases in the middle stratosphere and somewhat negative biases in the upper and lower stratosphere with respect to both SWOOSH and GOZCARDS data. The largest biases (up to 10 %) are observed in the tropical lowermost stratosphere as well as polar latitudes. However, these largest differences in the tropical lowermost stratosphere (and upper troposphere) cannot be correctly validated as most satellite retrievals show largest uncertainties in this region . Similarly, for the non-MLS period, the biases in the polar stratosphere could be due to the lack of observational ozone profiles during polar night.
Figure 3
Relative differences (in percent) as a function of pressure and latitude between ML-TOMCAT and (a) GOZCARDS V2 and (b) SWOOSH . Stippling indicates regions where differences are smaller than 1 standard deviation.
[Figure omitted. See PDF]
Figure shows TOMCAT, ML-TOMCAT, SWOOSH and GOZCARDS ozone time series over the Equator ( latitude) at three pressure levels (1, 10 and 50 hPa). Supplement Figs. S1 to S10 show similar comparisons at 15 N, 15 S, 30 N, 30 S, 45 N, 45 S, 60 N, 60 S, 75 N and 75 S latitude bins. The grey shaded area indicates the standard deviation of the ozone values within each bin for the GOZCARDS time series. The green shaded areas indicates the root mean square uncertainty of the combined data sets for each bin in SWOOSH data ( in ). Overall, there is a good agreement between the ML-TOMCAT, GOZCARDS and SWOOSH time series. As seen in Fig. , ML-TOMCAT shows significant improvements in the tropical stratosphere compared to TOMCAT.
Figure 4
Comparison between TOMCAT (blue lines) and ML-TOMCAT (red lines) ozone mixing ratios over the Equator (0) at (a, top) 1 hPa, (b, middle) 10 hPa and (c, bottom) 50 hPa. Satellite-based ozone mixing ratios from GOZCARDS and SWOOSH data sets along with their uncertainty estimates (shaded) are shown with black- and green-coloured lines, respectively.
[Figure omitted. See PDF]
A peculiar detail of Fig. is that the standard deviation in the SWOOSH time series is largest during the 1991–1999 time period, which could be due to a combination of various factors. First, UARS MLS ozone profiles are retrieved at only six levels per pressure decade instead of 12 levels per decade for Aura MLS (see
Next we scrutinise percentage differences between GOZCARDS and ML-TOMCAT on the same pressure levels. Figure shows relative differences between TOMCAT, ML-TOMCAT and SWOOSH ozone time series with respect to GOZCARDS. As seen earlier, TOMCAT ozone shows up to 40 % positive biases in the lower stratosphere and 10 % negative biases in the upper stratosphere (also seen in Fig. ). In contrast, ML-TOMCAT biases are well below 5 % at all levels. At 50 hPa, TOMCAT biases seems to follow QBO-type oscillations that are correctly removed in ML-TOMCAT. Similarly, at 1 hPa TOMCAT differences show some uneven variations that could be linked to the inhomogeneities in the ERA-5 dynamical fields that are used to force TOMCAT. Furthermore, ML-TOMCAT differences show much smaller and almost linear biases at 1 hPa and lie well within the spread of GOZCARDS data.
Figure 5
As Fig. but for the residuals, i.e. relative differences between SWOOSH (green), TOMCAT (blue) and ML-TOMCAT (red) ozone with respect to GOZCARDS ozone.
[Figure omitted. See PDF]
Interestingly, although both GOZCARDS and SWOOSH are created by merging nearly identical data sets, there are differences between them which are largest for the 1984–2004 time period. This indicates that even slight differences in merging methodology lead to large differences in the merged data set. Although we use completely independent output from a CTM as a basis data set, GOZCARDS–ML-TOMCAT differences are within the expected discrepancy between GOZCARDS and SWOOSH data sets, especially at 10 and 50 hPa.
Another notable feature in Fig. is that at 50 hPa ML-TOMCAT shows the largest differences during 2020, which could be associated with the biases in ERA-5 dynamics during that period. A TOMCAT sensitivity simulation forced with ECMWF operational analysis data shows better agreement with MLS ozone variation during this period
We now compare ML-TOMCAT ozone profiles against altitude-based merged satellite data sets. Figure shows the relative differences between TOMCAT/ML-TOMCAT vs. SAGE-CCI-OMPS , BSVert and SCIA-OMPS data sets as a function of altitude and latitude. The top panels (a and b) compare the mean relative differences between the SAGE-CCI-OMPS data set, TOMCAT and ML-TOMCAT, respectively. Here TOMCAT shows large positive biases (up to 20 %) in the lowermost stratosphere and negative biases (up to 15 %) in the upper stratosphere. On the other hand, ML-TOMCAT shows only 10 % biases throughout the stratosphere. Larger biases are seen in the Antarctic stratosphere that could be attributed to the limited observational ozone profiles used to construct the altitude-based merged satellite data products. Interestingly, ML-TOMCAT shows largest biases (up to 30 %) with respect to the BSVert data set, though TOMCAT profiles (forced with ERA-Interim) are used as transfer function while constructing BSVert . In addition, in the lowermost stratosphere, biases are negative in the tropics and SH mid-latitudes and positive in the NH middle and high latitudes. Hence, a contributing factor for these hemispherically asymmetric biases with respect to BSVert ozone profiles might be differences between ERA-Interim and ERA-5 reanalysis data
Figure 6
Relative difference (%) as a function of latitude and altitude between (a) TOMCAT versus SAGE-CCI-OMPS (1985–2019) and ML-TOMCAT versus (b) SAGE-CCI-OMPS (1985–2019), (c) BSVert (1985–2017) and (d) SCIA-OMPS (2002–2019), averaged over the respective time series. Stippling indicates regions where differences are smaller than 1 standard deviation.
[Figure omitted. See PDF]
Figure compares TOMCAT and ML-TOMCAT profiles with the three altitude-based ozone data sets with a focus on the Equator (0 latitude). Supplement Figs. S11 to S20 show similar comparisons for 15 N, 15 S, 30 N, 30 S, 45 N, 45 S, 60 N, 60 S, 75 N, and 75 S latitude bins. Figure displays the respective relative differences with respect to the SAGE-CCI-OMPS data set, which in this case is taken as a reference. In this way it is possible to evaluate the improvement introduced by applying the ML algorithm but also have an estimation of the discrepancies between different merged data sets, which is expected to be on the order of 5 %–10 %. With respect to the comparison with the data sets on the pressure vertical coordinate, the scatter between the time series is larger here, due to the larger variety of different satellites available to produce the merged products and the fact that they have not been used in the ML training.
Figure 7
Comparison between TOMCAT (blue lines) and ML-TOMCAT (red lines) ozone mixing ratios over the Equator (0) at (a, top) 45 km, (b, middle) 35 km and (c, bottom) 25 km. Satellite-based ozone mixing ratios from SAGE-CCI-OMPS, BSVert and SCIA-OMPS data sets are shown with black-, green- and cyan-coloured lines, respectively.
[Figure omitted. See PDF]
At about 45 km in the tropics the ML algorithm seems to overcorrect the negative bias shown by TOMCAT, leading to generally higher ozone values with respect to the other data sets, especially in the first half of the time series. In the middle stratosphere we find the best agreement between SAGE-CCI-OMPS and ML-TOMCAT; here the expected discrepancies among the merged data sets are comparable to the differences observed between ML-TOMCAT and SAGE-CCI-OMPS. At the peak of the ozone number profile around 25 km, we notice generally lower values for ML-TOMCAT, on average by 5 %. Similar biases are observed at middle and high latitudes as well as seen in Supplement Figs. S11 to S20. The strong seasonal cycle seen in the TOMCAT difference with respect to the merged data sets is largely reduced by ML-TOMCAT at this altitude.
4.3.3 Polar regionsThe use of ML-TOMCAT helps to fill the observational gaps especially in atmospheric regions with a lack of observations and before the beginning of the 21st century, when satellite measurements were sparser. For example, polar regions during local winter cannot be observed by limb observations based on scattered sunlight. Instruments such as Aura MLS and the Sounding of the Atmosphere using Broadband Emission Radiometry (SABER, ) have generally been used to fill this gap over the last two decades. For chemical models, complexities are also associated with the denitrification and dehydration (or chlorine activation) schemes that determine heterogeneous ozone losses . Though most of our earlier studies showed that TOMCAT is able simulate to polar ozone losses quite realistically (e.g. ), some systematic biases in polar stratosphere were noted in and . Figure compares ozone at 18 km over the Arctic, demonstrating the good agreement between ML-TOMCAT and MLS in this region for both local summer and winter seasons. The bottom panel shows the ozone sub-column over the Antarctic (poleward of 70 S latitude) integrated between 12 and 20 km for TOMCAT, ML-TOMCAT and MLS averaged over September–October months. The good agreement between MLS and ML-TOMCAT during the ozone hole period is observed for most of the years. ML-TOMCAT enables the reconstruction of the large ozone losses which occurred in the 1980s during a phase when ozone-depleting substances were on a rapid rise before the implementation of the Montreal Protocol and their phaseout.
Figure 8
Same as Fig. but for the residuals, i.e. relative differences between TOMCAT (blue), ML-TOMCAT (red), BSVert (green) and SCIA-OMPS (cyan) ozone with respect to SAGE-CCI-OMPS.
[Figure omitted. See PDF]
Figure 9
(a) Ozone concentration time series (molecules cm) at 18 km over the Arctic region (latitudes poleward of 70 N). Aura-MLS and the Sounding of the Atmosphere using Broadband Emission Radiometry (SABER, ) data are superimposed on TOMCAT and ML-TOMCAT time series. (b) Mean ozone sub-column (DU) between 12–20 km for September and October each year over the Antarctic region (latitudes poleward of 70 S).
[Figure omitted. See PDF]
Figure 10
ML-TOMCAT (red line) and TOMCAT (blue line) total column ozone comparison with SBUV merged ozone data (MOD, black line) obtained from
[Figure omitted. See PDF]
4.3.4 Total column comparisonAs noted above, total column measurements have relatively small retrieval errors and high temporal resolution and thus provide an important data set for assessing model performance. Hence, we compare total column ozone from ML-TOMCAT and TOMCAT with the SBUV merged ozone (MOD) data set (
As noted earlier, TOMCAT total column differences are relatively small for both Arctic (60–90 N) and Antarctic (60–90 S) regions, and Fig. 10 clearly shows that the same is true for ML-TOMCAT total columns as well. For mid-latitudes (35–60), TOMCAT shows biases of up to 20 DU biases (especially in the NH mid-latitude) compared to observations, whereas ML-TOMCAT shows differences of less than 10 DU. Interestingly, for the tropics (20 S–20 N), TOMCAT shows negative biases until 2000 and slightly positive biases afterwards that are almost negligible in ML-TOMCAT time series. On the other hand, for the near-global average (60 S–60 N), ML-TOMCAT biases remain positive until 1990 and are close to TOMCAT biases. After 2000 TOMCAT seems to show slightly increasing positive biases with respect to SBUV MOD data, but ML-TOMCAT seems to show almost negligible biases without any apparent trend.
Overall ML-TOMCAT ozone profiles outside the 316–1 hPa range should be considered (slightly modified) TOMCAT model profiles. However, total column (and tropospheric column shown in Supplement Fig. S22) comparisons suggest that vertically integrated (1000–0.1 hPa) ML-TOMCAT profiles can provide a useful estimate which is better than TOMCAT and also better than combining ML-TOMCAT stratospheric column with tropospheric column from other sources (noting that levels above 1 hPa have a negligible contribution to the column). Hence, for convenience we include both tropospheric and lower-mesospheric ozone values in ML-TOMCAT data files (1000–0.1 hPa) even though they are only based on values from the TOMCAT model outside of the pressure range 316–1 hPa. For future versions of ML-TOMCAT we aim to also correct tropospheric ozone profile biases using merged tropospheric ozone profile data sets described, for example, in the Tropospheric Ozone Assessment Report (TOAR).
5 Data availability
We thank Sean Davies for SWOOSH data, which are publicly available via
6 Summary and conclusions
Stratospheric ozone concentrations are affected by many short- and long-term processes; hence, high-quality ozone profile data sets are needed for accurate attribution studies. Though satellite instruments provide global measurements, due to their short mission durations various merging methodologies have been adopted to create homogeneous and gap-free long-term ozone profile data sets. Individual merging methodologies have distinct advantages and disadvantages. Atmospheric chemical models are also able to simulate chemically consistent long-term data sets, but they are prone to the deficiencies associated with the simplified parameterisations and uncertain parameters.
Here we have used TOMCAT CTM ozone profiles and a random-forest (RF) regression model to create gap-free ozone profile data set (ML-TOMCAT) for 1979–2020. The RF is applied to the ozone difference between the SWOOSH and TOMCAT ozone profiles by selecting 20 years of MLS measurements (UARS-MLS and AURA-MLS) as a training period. RF show consistent performance throughout the stratosphere, except at high latitudes and the mid-latitude mid-stratosphere. Overall, ML-TOMCAT shows excellent agreement with SWOOSH for the entire time period (1984–2020), though somewhat larger differences are apparent for the period where limited ozone measurements are available for SWOOSH construction. We also find that ML-TOMCAT shows better agreement with satellite-based merged data sets which use pressure as the vertical coordinate (e.g. SWOOSH, GOZCARDS) but weaker agreement with the data sets which use altitude (e.g. SAGE-CCI-OMPS, SCIA-OMPS). We find that at almost all stratospheric levels ML-TOMCAT ozone concentrations are well within uncertainties of the observational data sets. Presently, the ML-TOMCAT V1.0 data set is ideally suited for the evaluation of chemical model ozone profiles from the tropopause to 0.1 hPa. ML-TOMCAT V1.0 ozone profile data on pressure and altitude levels in mixing ratios and number density units are publicly available via 10.5281/zenodo.5651194 .
The supplement related to this article is available online at:
Author contributions
SSD conceived the idea and initiated the study in discussion with MPC. The model runs were performed and analysed by SSD, MPC and WF. The figures were prepared by CA and SSD. The paper was written by SSD, MPC and MW, who included comments from all of the other coauthors.
Competing interests
The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements
We thank NASA, NOAA and ESA for GOZCARDS, SWOOSH and SAGE-CCI-OMPS data products. We thank the European Centre for Medium-Range Weather Forecasts for providing their analyses. TOMCAT simulations were performed on the UK national Archer and Leeds Arc4 HPC systems.
Financial support
This research has been supported by the NERC SISLAC project (NE/R001782/1). The financial support of part of this work from the state of Bremen, DAAD PRIME grant (Carlo Arosio), and ESA SOLVE Living Planet Fellowship (Carlo Arosio) is gratefully acknowledged. The University of Bremen contribution also benefited from the DFG VolImpact and BMBF SynopSys projects.
Review statement
This paper was edited by Martin Schultz and reviewed by two anonymous referees.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
High-quality stratospheric ozone profile data sets are a key requirement for accurate quantification and attribution of long-term ozone changes. Satellite instruments provide stratospheric ozone profile measurements over typical mission durations of 5–15 years. Various methodologies have then been applied to merge and homogenise the different satellite data in order to create long-term observation-based ozone profile data sets with minimal data gaps. However, individual satellite instruments use different measurement methods, sampling patterns and retrieval algorithms which complicate the merging of these different data sets. In contrast, atmospheric chemical models can produce chemically consistent long-term ozone simulations based on specified changes in external forcings, but they are subject to the deficiencies associated with incomplete understanding of complex atmospheric processes and uncertain photochemical parameters.
Here, we use chemically self-consistent output from the TOMCAT 3-D chemical transport model (CTM) and a random-forest (RF) ensemble learning method to create a merged 42-year (1979–2020) stratospheric ozone profile data set (ML-TOMCAT V1.0). The underlying CTM simulation was forced by meteorological reanalyses, specified trends in long-lived source gases, solar flux and aerosol variations. The RF is trained using the Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) data set over the time periods of the Microwave Limb Sounder (MLS) from the Upper Atmosphere Research Satellite (UARS) (1991–1998) and Aura (2005–2016) missions. We find that ML-TOMCAT shows excellent agreement with available independent satellite-based data sets which use pressure as a vertical coordinate (e.g. GOZCARDS, SWOOSH for non-MLS periods) but weaker agreement with the data sets which are altitude-based (e.g. SAGE-CCI-OMPS, SCIAMACHY-OMPS). We find that at almost all stratospheric levels ML-TOMCAT ozone concentrations are well within uncertainties of the observational data sets. The ML-TOMCAT (V1.0) data set is ideally suited for the evaluation of chemical model ozone profiles from the tropopause to 0.1 hPa and is freely available via10.5281/zenodo.5651194
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details






1 School of Earth and Environment, University of Leeds, Leeds, UK; National Centre for Earth Observation, University of Leeds, Leeds, UK
2 Institute for Environmental Physics, University of Bremen, Bremen, Germany
3 School of Earth and Environment, University of Leeds, Leeds, UK; National Centre for Atmospheric Science, University of Leeds, Leeds, UK