1 Introduction
Investigating ocean dynamics and climate variability requires accurate, regular and systematic observations of the sea surface temperature (SST). SST indeed plays a key role in air–sea interaction and upper ocean circulation processes ; it is used to track climate variability and change , and it is at the base of various chemical and biological processes . SST and the estimate of its gradients have also been proven to be a powerful tool for assessing and investigating mesoscale and submesoscale variability
In this context, here we investigate the potential of applying deep learning models to improve the effective resolution of a gap-free SST field, providing those small-scale features even when direct measurements are missing. We exploit techniques generally used in the field of computer vision, which have proven to be very successful, especially for processing gridded data and managing large-scale datasets while controlling the computational efficiency. In the field of oceanography, the research community only recently started to explore the applicability of machine learning (ML) methods to ocean remote sensing images . The applications include oil spill and eddy detection and parametrization , marine algae species discrimination , forecasting of ocean variables , and estimation of meteorological parameters . Moreover, good results have been achieved by applying neural networks (NNs) to space–time interpolation and short-term forecasting issues with satellite altimetry data , high-performance descriptions of turbulence processes , and SST reconstruction .
Among image-processing techniques, impressive results have been obtained with convolutional neural networks (CNNs) due to their high level of ability to extract the most important information from two-dimensional spatial fields. Recently, the application of CNN architectures in the process of reconstructing high-resolution images from low-resolution ones, the so-called single-image super-resolution (SR) problem, has attracted much attention in a wide range of scientific challenges. The idea is to implement a network that directly learns the end-to-end mapping between low- and high-resolution images. One of the simplest attempts made by was the construction of a network for image restoration composed of three two-dimensional convolutional layers with different kernel sizes. The three layers might be seen as the three conceptual phases of this NN algorithm: a first extraction of overlapping tiles from the input images converting them to feature maps and the non-linear mapping of these maps onto one high-dimensional vector of high-resolution feature maps, the final reconstruction aggregating the above representations to generate the final high-resolution image. This final image is expected to be similar to the ground-truth one. Despite the simplicity of the architecture, the Super Resolution Convolutional Neural Network (SRCNN) developed by achieved excellent results with respect to more traditional methods and it has already been applied to reconstruct satellite-derived SST data, with promising results . Building on Dong's work, several more complicated structures have been developed to tackle the super-resolution problem. In subsequent years, a few attempts to develop residual networks have shown the convergence improvement of deeper architectures, which was mainly given by the introduction of skip connections and recursive convolutions . Similarly, a step forward has been made by with the development of an enhanced deep residual network for super resolution (EDSR), which makes use of residual blocks with constant scaling layers after the last convolutional layer in order to stabilize the training even in the presence of a large number of filters. This modification led to significantly better accuracy using much deeper networks, while controlling the computational cost of the training phase. A further step has been proposed by with the adaptive residual blocks (ARBs), which replace the constant factor with adaptive residual factors, increasing the adaptability of the network. More specifically, in the ARB, feature responses (i.e., filter output channels) are re-calibrated on a channel-wise basis using a so-called squeeze-and-excitation module, before being combined with the block input. This process is in fact able to enhance the network's capability to capture intricate relationships among the learned feature channels. Recently, to further push these networks to efficiently handle different spatial scales in a multichannel input (each channel including a different variable with characteristic feature scales), introduced dilated convolutional multi-scale learning modules in the network developed by , expanding the network receptive fields while still controlling its computational cost. In that work, the deep learning network was designed to super-resolve absolute dynamic topography (ADT) by learning from both low-resolution ADT and high-resolution SST data through an observing system simulation experiment (namely, simulating all observations through an ocean general circulation numerical model).
Our aim is to exploit the ability of the dilated SR network to increase gap-free SST effective resolution by directly training our network on remote sensing SST data for both input and target datasets. We make use of the data produced by the Institute of Marine Sciences of the Italian National Research Council (CNR-ISMAR) within the Copernicus Marine Service, which consists of merged multi-sensor (L3S) and gap-free (L4) sea surface temperature products over the Mediterranean Sea at a high resolution (HR; nominally °) and ultra-high resolution (UHR; nominally °) . Considering that our UHR interpolation accounts for space–time decorrelation scales of 5 and 1 , in the presence of valid L3 UHR observations, the resulting L4 data can be considered to be submesoscale-resolving . UHR OI processing makes use of upsized L4 HR fields as an initial guess. This guess is left unchanged in the absence of L3 UHR data and the L4 effective resolution is thus, by definition, lower than ° there. Therefore, our goal is to enhance the effective resolution of the gap-free upsized ° background field.
2 Materials and methods
2.1 Training and test datasets
When dealing with deep learning methods, it is important to construct the training and test datasets; ensure a sufficient generalization capability; and, more specifically, avoid under- and over-fitting problems. Under-fitting occurs when the model fails to achieve a suitably low error in the training set, while over-fitting occurs when the gap between the training error and test error becomes excessively wide . Remote sensing data are a very suitable resource used to prevent the occurrence of these problems due to the wide availability of large-scale gridded datasets which are complex enough to encapsulate an extensive variety of features.
Figure 1
Sketch of the processing chains for the production of L3S and L4 SST fields at ° and ° spatial resolutions over the Mediterranean Sea. Green boxes represent shared modules, blue boxes contain the information and the data for the HR product, yellow boxes refer to the UHR processing chain, and red boxes highlight the application of the super-resolution technique.
[Figure omitted. See PDF]
The suite of products considered for this project provides the foundation SST (i.e., the temperature free of diurnal variability) over the Mediterranean Sea from 2008 to the present (10.48670/moi-00172) at near-real time (NRT). These data are built from level 2 (L2) infrared measurements (i.e., data in satellite native grid/swath coordinates) processed through a series of steps divided into different modules as detailed by and sketched in Fig. .
The L2 measurements are obtained from several instruments on board both geostationary and polar-orbiting satellites (including Sentinel-3A and Sentinel-3B, NOAA-20, Suomi NPP, Metop-B, Metop-C, Aqua, Terra, and SEVIRI). After a first shared module for the upstream data collection (M1), two separated processing chains are used to obtain the SST products at ° and ° spatial resolution. Both include a second module for the geographical and temporal extraction of the L2 data and the remapping onto the corresponding regular grid (M2) and a third module for the bias correction and quality control, which lead to the merging of all the data from the different sensors (M3). The output of each of these processing paths is an L3S product at the associated spatial resolution. The final gap-free fields are provided by applying a space–time optimal interpolation algorithm to the L3 data for both HR and UHR products (M4). However, the two OI schemes use different initial guesses in the absence of the observed data. While the HR processing chain makes use of a climatological background field to produce the L4 SST image at ° resolution, in the case of the finer-resolution product, the initial guess used by the OI algorithm is the L4 HR SST field, preliminarily upsized onto a ° regular grid (through a thin plate spline). As a consequence, due to the small decorrelation scales assumed in the UHR interpolation, small-scale features are correctly represented only when valid UHR L3 observations are present close to the interpolation point within a short temporal window. For this reason, in the absence of these observations, the final UHR product will have an effective resolution that is equal to or coarser than °.
We train the network to improve the satellite SST effective resolution by introducing realistic small-scale features in the interpolated and upsized gap-free L4 HR SST images (dilated Adaptive Deep Residual Network for Super Resolution, dADRSR, input in Fig. ). The target data are derived from a ground-truth super-collated L3S UHR SST dataset (namely, merged multi-sensor data) that was specifically built for this purpose. The dataset (hereinafter called L3S Sentinel) is obtained by applying the first three modules of the UHR SST processing chain to acquisitions only from the Sea and Land Surface Temperature Radiometer (SLSTR) on board of the Sentinel-3A and Sentinel-3B satellites due to their high radiometric accuracy and kilometer-scale resolving capabilities .
Both input and target datasets are mapped on a regular grid at ° spatial resolution over the Mediterranean Sea for the year 2020. Since our goal is to retrieve small-scale features, a first moving-average high-pass filter (with a kernel radius of 200 ) is applied to remove the large-scale dynamics in both input and target images . Then, all 100 100 regions (herein referred to as tiles) consisting of at least 95 % valid values were extracted from the filtered fields. Tiles were allowed to overlap at most one other tile by 50 %. SST values are then transformed into anomalies to avoid seasonal variability and scaled between 1 and 1 through a classical min–max normalization technique, i.e., 1
Figure 2
On the left are the SST first-guess maps used to extract the input tiles (a) and the SST L3S Sentinel target image (b) on 4 January 2020. On the right are examples of extracted tiles used for the training and test datasets.
[Figure omitted. See PDF]
Figure 3
Schematic of the dilated adaptive deep residual network for super resolution developed by . Conv stands for convolutional, M-ARB for multi-scale adaptive residual block, SE for squeeze-and-excitation module, and Add indicates aggregations of the outputs from the networks blocks.
[Figure omitted. See PDF]
The test dataset is finally selected, setting aside 4 individual days which are representative of the different seasons and composing the 15 % of the total number of the tiles available after the preprocessing. These data are separated in a fully independent dataset (i.e., data never used in the training phase), which is composed of 18 576 overlapping tiles. The training dataset finally consists of 94 110 pairs of tiles extracted from the remaining 362 , of which one part (properly separated from the original training dataset at the beginning of the training phase following an ratio between the two) is used by the network in the validation phase. In Fig. , an example of the images used to construct the two datasets is shown. The top-left panel depicts the gap-free first-guess map (i.e., L4 HR upsized at ° spatial resolution) on 4 January 2020 and below the corresponding L3S Sentinel image derived from merged Sentinel-3A and Sentinel-3B data. On the right, we provide an example of the a pair of tiles extracted from the related low- and high-resolution SST fed as the input and target to train the network, respectively.
2.2 Super-resolution convolutional neural networkIn deep learning, super-resolution algorithms are example-based methods, which generate exemplar patches from the input image. As mentioned above, the application of convolutional neural networks to the super-resolution problem is based on networks that directly learn an end-to-end mapping between low- and high-resolution images. These networks consist of a series of interconnected layers, which make use of the convolution operator simulating the connectivity between neurons observed in the organization of the animal visual cortex. Formally, the output of each layer is a function of a transformation of the previous layer output ; that is, the following applies:
2 where is the non-linear activation function (in this case, the rectified linear unit or ReLU), the weights, the biases and the convolution operator. The array of weights, generally called filters or kernels, is able to detect a specific type of feature in the input (generating what is called a feature map) and might include, for instance, Gaussian-like filters or edge detectors along different direction or any other kind of filter learned during the training. Due to each of the layers having its own functionality, they will contain different structures. The main idea is that the network learns the weights of the filters and updates the parameters of the system through an optimization process based on minimizing the error between the output and the data from the validation set. This simple yet powerful idea can be augmented as much as desired, especially when exploiting the potential of deep networks. The CNN implemented here was originally developed by and called dilated Adaptive Deep Residual Network for Super Resolution (dADRSR). In the dADRSR network (schematized in Fig. ) the low-resolution input dataset is initially fed to three parallel dilated convolutional layers with the same number of filters (equal to 3 3) but increasing dilation factor (1, 3 and 5, respectively), which allows for a larger receptive field, extracting information at different scales without increasing the number of parameters. After this first stage, the data pass through a sequence of 12 multi-scale adaptive residual blocks (M-ARBs), each including two sets of parallel dilated convolutional layers (with 120 and 10 filters, respectively) and a squeeze-and-excitation (SE) module that is able to improve channel interdependencies at almost no computational cost before being summed up to produce the final high-resolution output. The main characteristics of an adaptive strategy involve replacing the fixed scaling of learned features with an adaptive scaling mechanism, carried out by the SE block (which is a kind of attention mechanism). It captures the global importance of each channel by initially squeezing the feature maps to a single numeric value (therefore obtaining a vector of a size equal to the number of channels) and, finally, feeding this output to a two-layer “bottleneck” network which will produce new features maps, scaling each channel based on its importance. The detailed discussion of the architecture of the network may be found in . The training algorithm follows an early stopping rule which terminates the iterations as soon as the validation loss function increases for a previously chosen number of epochs (defined by the patience parameter, which is set to 20 here). An adaptive learning rate (initialized at ) is given by the Adam optimizer where the hyperparameters are set following the values found in most recent literature : the numerical stability constant is , the exponential decay rates for the first- and the second-moment estimates are set to and , respectively. Instead of using the classical dropout regularization technique, a DropBlock strategy is implemented where contiguous regions of a feature map are dropped together, which has been shown to increase the accuracy of the network for convolutional layers . To evaluate the accuracy, the mean-squared error is used as a reference in the loss function. Finally, the dADRSR training model uses almost 1.6 trainable parameters. All codes are written in Python using the deep learning framework Keras and the training was performed on a single NVIDIA T4 GPU in almost 4 .
2.3 Evaluation of model performancesThree different error measures are calculated to evaluate the network reconstruction performance between the ground-truth image and the network output .
Firstly, we consider the classical root-mean-squared error (RMSE) given by
3 where and are the SST values of the th pixel of images and , respectively, and is the total number of pixels. This measure can be useful for evaluating the accuracy of a reconstructed value pixel by pixel, but it can be misleading in the assessment of the ocean state reconstruction depending on the objectives of the application considered. For instance, if the NN is able to reproduce an ocean structure in a position which is slightly misplaced with respect to the ground-truth measurement, the RMSE will be high, indicating a poor reconstruction, which is even worse than if it entirely missed the structure. This issue is often referred to as the double penalty issue since point-matching measures will penalize the misplacement twice (where the structure should actually be and where it is incorrectly predicted). However, in some cases, it is possible that capturing an ocean phenomenon, even if in a slightly wrong position, is better than missing it completely. For this reason, we also consider two additional measures that are usually considered in image processing.
The most commonly used measure for reconstructed image quality is the peak signal-to-noise ratio (PSNR), representing the ratio between the maximum possible pixel value of image and the power of distorting noise that affects the quality of its representation, usually represented by the RMSE itself: 4
The PSNR can be seen as an approximation to human perception of reconstruction quality, where the higher the value, the better the quality of the image.
The third error measure is the structural similarity index measure (SSIM) proposed by , which is widely used for measuring image quality and especially the similarity between two images. The concept is based on the idea that while PSNR estimates perceived errors to quantify image degradation, SSIM captures changes in perceived structural information variation. That is, if a reconstructed image is altered with a different type of degradation (for instance, mean-shifted, blurred or with a salt–pepper impulsive noise effect) while the RMSE comes out nearly identical for all the cases, the SSIM will capture the different perceptual quality, which is a weighted combination of luminance, contrast and structure measurements. The mathematical formulation of the SSIM between two images, and , is given by 5 where and are the mean values of and , respectively; is the cross-correlation of and ; and are the variance of and , respectively; and and are the regularization constants for the luminance, contrast and structural terms.
All these errors are computed for the final reconstructed maps over the whole Mediterranean Sea. The combination of the super-resolved tiles is made by considering a linear combination of the values obtained for the same pixel, weighted according to their position within the tile. That is, the coefficient of each value decreases as its distance becomes larger from the central pixel of the tile. Finally, the large-scale field, initially removed to isolate the small-scale features, is added back to reproduce the large-scale structure of the original SST field.
3 Results and discussionOur aim is to verify whether the dADRSR network, trained by means of satellite-derived observations, is able to improve the effective resolution of the SST fields in the areas where our interpolation technique removes most of the spatial variability associated with mesoscale and submesoscale processes. Due to computational costs, a preliminary study based on a restricted test dataset was performed. The performances of the dADRSR network in comparison with other deep learning algorithms and a sensitivity analysis on the impact of different choices for the architecture are presented in Sect. . Finally, an additional test on an independent dataset built on a 1-year-long series of SST fields was carried out (Sect. ).
Figure 4
Comparison of the SST fields of 1 August 2020 provided by the reconstruction of the dADRSR (a), the L3S data measured by Sentinel-3A and Sentinel-3B (b), and the first-guess SST field (c).
[Figure omitted. See PDF]
3.1 Preliminary validation studyFigure shows the comparison of the result obtained on one of the SST fields included in the test dataset, which corresponds to the SST daily map of 1 August 2020. The reconstruction of the dADRSR network at ° spatial resolution over the Mediterranean Sea is shown in the top panel. In order to visually evaluate the reconstruction of the network, the corresponding L3S SST merged field observed by Sentinel-3A and Sentinel-3B (central panel) and the first-guess map used in the optimal interpolation algorithm (bottom panel) are shown below the network output. What is evident from this figure is that the SST features estimated by the CNN appear much sharper than the ones approximated by the low-resolution map, showing promising capabilities to effectively reconstruct dynamical features.
Figure 5
Comparison of the SST (a, d) and SST gradients (c, f) provided by the reconstruction of the dADRSR (a–c), the L3S data measured by Sentinel-3A and Sentinel-3B (b, e), and the optimal interpolated first guess (d–f) in the selected region delimited by the coordinates 30–40° N, 19–36° E.
[Figure omitted. See PDF]
To highlight the ability of the CNN to more accurately capture small-scale features with respect to the statistical algorithm, we show in Fig. three smaller panels corresponding to zoomed-in regions of the fields shown in Fig. , delimited by the coordinates 30–40° N, 19–36° E, with the correspondent SST spatial gradients (calculated using the Sobel operator) for the dADRSR reconstruction, the L3S Sentinel ground-truth data and the first-guess approximation (from top to bottom, respectively). This particularly structure-rich area is an excellent example to demonstrate the ability of the network to capture dynamical processes which are quite clear in the high-resolution L3S Sentinel data (shown in the central panels). In fact, SST fronts are strongly connected with the surface dynamics, and they are generally associated with energetic motions at the mesoscale and submesoscale. While the CNN is able to capture at least the most energetic structures (as shown in the top panels of Fig. ), the first guess presents an extremely smooth field due to the OI algorithm applied where all the small-scale features have been filtered out even when high-resolution data are present. We recall that the Sobel gradient operates on a 3 pixel 3 pixel kernel, a spatial scale not readily visible in the figures. This is why the first-guess gradient field is so much weaker than the CNN or SLSTR fields even though the large-scale structures are similar.
Figure 6
Comparison of the performance of the SST (a) and (b) dADRSR reconstruction and the first guess with respect to the L3S data measured by Sentinel-3A and Sentinel-3B satellites. Positive red values show an improvement of the network reconstruction with respect to the optimal interpolated first guess.
[Figure omitted. See PDF]
This visual analysis is quantitatively confirmed by the maps in Fig. , displaying the difference between the error made by the low-resolution approximation and the dADRSR model with respect to the original L3S Sentinel image, averaged to 1° 1° boxes. Here, red indicates an improvement of the CNN with respect to the first-guess image and blue a degradation. A clear predominance of red boxes is found in both the SST and the SST gradient error maps.
In Table , the comparison is summarized quantitatively, with the network reconstruction presenting a RMSE 0.31 (versus the 0.33 obtained from the first-guess approximation) and a mean PSNR equal to 37.9 and a SSIM equal to 0.54, both larger than the first-guess result. To highlight the quality of the dADRSR reconstruction, we performed the same test using other deep learning super-resolution models: the enhanced deep residual network for super resolution (EDSR) proposed by , the adaptive deep residual (ADR) network for super resolution developed by and the dADRSR proposed by , setting the number of M-ARBs to be equal to 6 instead of 12 (called dADRSR/2 hereinafter). We recall that while the RMSE should be low to ensure a good approximation, for the other two quantities high values indicate an improvement. The dADRSR output achieves the best value for all the evaluation methods presented.
Table 1Error estimations of the SST given by the first-guess map, the EDSR network , the ADR reconstruction , the dADRSR built with half of the M-ARB (called dADRSR/2) and the dADRSR output with respect to the L3S ground truth; the RMSE given by Eq. () and the corresponding confidence interval calculated by the bootstrapping procedure, the peak signal-to-noise ratio obtained by Eq. (), and the structural similarity index measure given by Eq. ().
Model | RMSE [] | PSNR | SSIM |
---|---|---|---|
First guess | 0.33 7 10 | 37.5 | 0.53 |
EDSR | 0.32 6 10 | 37.7 | 0.54 |
ADR | 0.32 6 10 | 37.9 | 0.54 |
dADRSR/2 | 0.32 8 10 | 37.7 | 0.54 |
dADRSR | 0.31 7 10 | 37.9 | 0.54 |
Figure 7
The PSD of the SST (b–d) and SST gradients (e–g) reconstructed by the dADRSR (in yellow), the first-guess map (in red) and the L3S Sentinel high-resolution observations (in blue) over the three zones delimited by black rectangles on the L3S SST field on 1 August 2020 in panel (a).
[Figure omitted. See PDF]
To analyze the effectiveness of the high-resolution reconstruction, we compare the power spectral density (PSD) of the network output with the L3S Sentinel product and the first-guess map over three selected zones. We compute the PSD via a fast Fourier transform (FFT) with a Blackman–Harris window over the three areas corresponding to the boxes labeled a, b and c in Fig. . The zones are chosen in order to have the maximum number of valid pixels available in the L3S Sentinel observations and represent the following different dynamical regimes:
- a.
a region over the Sea of Sardinia of low spatial variability,
- b.
a region over the Ionian Sea with an important SST variability between the eastern and the western part of the area, and
- c.
a region over the Levantine Sea characterized by small-scale structures.
Figure 8
The ratio between the spectral content of the mapping error and the spectral content of the L3S observation of the SST for the dADRSR reconstructed fields (in yellow) and the first-guess map (in red) over the three zones delimited by black rectangles on the L3S SST field on 1 August 2020 presented in Fig. a. The black line represents the ratio equal to 0.5 and the dashed black line and the dotted black line represent the intersection between PSD ratio of the first guess and the dADRSR, respectively, showing the effective resolution of the products.
[Figure omitted. See PDF]
The three central panels of Fig. show the PSD, presented as a function of the wavenumber, of the SST reconstructed by the dADRSR (in yellow), the first-guess map (in red) and the high-resolution L3S Sentinel observations (in blue) over the three zones delimited by the black rectangles over the L3S SST field on the top panel. In all these cases, note that the PSD follows the same behavior for wavenumbers smaller than 1 per degree, i.e., for scales larger than 100 . This means that for such scales, the SST fields obtained by both the CNN and the first-guess characterize large mesoscale features well. Conversely, for regions a and b, both the first-guess and the network reconstruction exhibit a significant PSD decrease for wavenumbers higher than 1 per degree, but starting from approximately 10 per degree, the first-guess spectrum separates from the dADRSR spectrum (and consequently increases the distance from the L3S Sentinel one), indicating a poorer reconstruction of spatial features below 10 . The network spectrum, on the other hand, shows that the machine learning algorithm captures those small-scale features in all the cases better. Analogous behavior is found for the spectra of the SST gradients over the same three regions (three bottom panels). The abrupt decreases seen in the L3S Sentinel and first-guess spectra starting at 3 10 per degree are probably due to artifacts introduced by the re-gridding, as already discussed by , leading to the lack of physical meaning for the spectra from this point onwards (i.e., for the highest wavenumbers). Overall, we can conclude that the CNN spectra tracks the observed spectra from 1 to 10 per degree quite well, while the first-guess spectra are more energetic in this region. On the other hand, although the CNN spectra are less energetic than the L3S Sentinel spectra for wave numbers greater than 10 per degree, they are more representative of the true spectra in this region than the first-guess spectra.
Figure 9
The power spectral density profiles (c, d) of different SST products under cloudy conditions calculated along the three transects (black lines) on the L3S SST field on 20 December 2020 (a, b). The MED L4 NRT HR product at ° (Copernicus) is in blue, the MED L4 NRT UHR at ° (Copernicus) is in red, the GLOBAL OCEAN OSTIA product at 0.05° (Copernicus) is in yellow, the Multi-scale Ultra-high Resolution (MUR) product at 0.01° (NASA-JPL) is in purple and the dADRSR reconstruction developed in this work is in green.
[Figure omitted. See PDF]
To quantify the differences between the effective resolution of the products, we calculated the PSD ratio (RPSD) between the spectral content of the mapping error (e.g., ) and the spectral content of the L3S Sentinel observation of the SST for the dADRSR-reconstructed fields and the first-guess map (see Fig. ) as follows: 6 where is the wavelength, is the spectra of the L3S Sentinel observations, and PSD is the spectra of the differences between the L3S data and either the output of the CNN or the first-guess field. The RPSD defined in Eq. () is calculated over the three zones delimited by black rectangles on the L3S SST field on 1 August 2020 presented in the bottom panel of Fig. . The effective resolution of the products, based on the intersection between this PSD ratio and the ratio equal to 0.5 as defined by , shows the ability of the dADRSR network to resolve smaller scales.
Figure 10
Comparison of the daily RMSE time series of the SST obtained via dADRSR reconstruction (red) and the L4 first guess (blue) with respect to the L3S data measured by Sentinel-3A and Sentinel-3B satellites during the year 2021. The jump at Julian day 134 is due to the absence of L3 observation for that day, which made the comparison with the input and the output of the network impossible.
[Figure omitted. See PDF]
In order to demonstrate the effective spatial resolution enhancement of the super-resolved images under cloudy conditions, we performed a power spectral density analysis on different SST products available over the Mediterranean Sea (Fig. ). The products used are the L4 NRT HR at a nominal ° resolution and the L4 NRT UHR at a nominal ° spatial resolution provided by CNR for the Copernicus Marine Service (10.48670/moi-00172), the Global Ocean OSTIA product developed by the UK Met Office at a 0.05° resolution (10.48670/moi-00165), the Multi-scale Ultra-high Resolution (MUR) product provided by the NASA-JPL at a 0.01° resolution , and the super-resolved SST field obtained by the application of the dADRSR network developed in this work at a 0.01° resolution. The PSD are computed along the three transects (a, b and c in Fig. a), chosen in three areas affected (at different levels) by cloud coverage. In all cases, the slope of the spectra is very similar for the three products at higher resolution (namely, the UHR, the MUR and the dADRSR approximations), while lower values are observed for the HR and OSTIA products (which do not reach wavenumbers larger than 0.1 per degree by construction). The PSD analysis for section a (Fig. b) shows that the dADRSR approximation is not affected by excessive noise as are the other products, presenting PSD values that are higher than the corresponding values of the MUR and UHR spectra (especially at small scales). For sections b (Fig. c) and c (Fig. d), the green line representing the dADRSR spectrum lies above all the other lines for almost the entire wavenumber range.
3.2 An extensive test datasetHaving established the potential ability of the network, we performed an additional validation test with the aim to strengthen the robustness of the statistics. We built a new totally independent dataset exploiting the L4 SST fields used as an initial guess for the OI algorithm, which produces gap-free SST maps over the Mediterranean Sea for the year 2021 at a 0.01° 0.01° spatial resolution. Before applying the dADRSR network to this dataset, the fields were preprocessed following the same steps described in Sect. . The comparison between the evolution of the daily basin-scale RMSE for the input first guesses and the dADRSR reconstructions for the entire year confirms the improvement of the network output with respect to the first-guess fields (Fig. ). The mean values of the RMSE for 2021 are in line with the ones found for the previous test (i.e., 0.31 for the dADRSR output versus 0.33 for the first-guess approximation).
Figure 11
Comparison of the performance of the SST (a) and (b) dADRSR reconstruction and the L4 first guess with respect to the L3S data measured by Sentinel-3A and Sentinel-3B satellites during the year 2021. Positive red values show an improvement of the network reconstruction with respect to the optimal interpolated first guess.
[Figure omitted. See PDF]
Analogous to Fig. , the differences (averaged on 1° 1° boxes) between the errors in the two methods approximating the L3S Sentinel-observed data (Fig. ) present an overall improvement of the output of the dADRSR network with respect to the interpolated maps. The positive red values are found almost everywhere for both SST and SST gradient fields.
4 ConclusionsThe advance obtained by the application of machine-learning-based techniques for the improvement of the effective resolution of remote sensing observations have recently opened a new way to approach satellite-derived data processing. The great advantages provided by making high-resolution gap-free images available to a wide range of scientific users are severely limited by the number of valid L3 observations. In the case of sea surface temperature measurements, infrared data are commonly contaminated by cloud cover, reducing the quality of the L4 data that can be obtained via statistical interpolation techniques. The machine learning approach used here exploits the progress made in the field of computer vision for extrapolating high-resolution features even when a direct measurement is missing. Learning directly from L3S SST fields and taking advantage of both dilated convolution and attention mechanisms, the deep neural network employed here proved to be able to reproduce small-scale signals that are generally smoothed out by optimal interpolation algorithms. The strong variability in the SST in the Mediterranean Sea allowed us to obtain excellent results even by only considering 1 year of data during the training phase. However, it is important to investigate whether using longer time series may help to improve the network ability to reconstruct SST fields as well as to rely on more robust statistics. Moreover, given the inhomogeneity of the spatial error distribution related to the interpolation technique, it would be interesting to expand the present investigation in order to take into account the OI error field as an additional predictor and to consider the contribution of the error in the SST gradients in the loss function. Another aspect that deserves further investigation concerns the applicability of the dADRSR network to differentiating between sea and ocean areas even though fine-tuning the model would probably be needed.
In the future, we also plan to study other super-resolution techniques which have recently become popular in the field of computer vision. Firstly, we are currently investigating the possibility of improving the reconstruction of small-scale features in SST fields via other successful generative AI, such as GANs or diffusion models. The former exploits the outcomes of a min–max game between a generator of reconstructed images and a discriminator which tries to distinguish the real image from the output of the other network; the latter builds super-resolved fields, initially introducing noise into the initial signal, only to then reverse this process until it converges to the desired distribution. However, although such networks applied to remote sensing data have proven to be able to reconstruct very realistic small-scale structures, they seem to fail to optimize a point-match evaluation of reconstructed remotely sensed SST fields. Secondly, we would like to explore the usage of vision transformers (ViT) for understanding and reproducing high-level structures by understanding contextual relationships between the patches of an image.
The results achieved here, however, may already benefit a wide range of applications. Super-resolved SST fields would facilitate the challenging task of two-dimensional/three-dimensional ocean dynamics reconstruction in synergy with other variables
Code availability
The main core of the dADRSR code is based on the network developed by Buongiorno Nardelli et al. (2022), which is available at:
Data availability
The datasets used to train and validate the neural networks in this work were derived from the Mediterranean Sea High Resolution and Ultra High Resolution Sea Surface Temperature Analysis at Near Real Time, available through the Copernicus Marine Service Information (CMEMS). Marine Data Store (MDS) (10.48670/moi-00172, ).
Author contributions
CF, AP and BBN: conceptualization; CF, DC, AP and BBN: methodology; CF, DC, AP and BBN: investigation; CF: visualization and writing (original draft); and CF, DC, AP and BBN: writing (review and editing). All authors have read and agreed to the published version of the paper.
Competing interests
The contact author has declared that none of the authors has any competing interests.
Disclaimer
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Special issue statement
This article is part of the special issue “Special Issue for the 54th International Liège Colloquium on Machine Learning and Data Analysis in Oceanography”. It is a result of the 54th International Liège Colloquium on Ocean Dynamics Machine Learning and Data Analysis in Oceanography, Liège, Belgium, 8–12 May 2023.
Acknowledgements
The authors wish to thank the reviewers for the useful feedback and, in particular, Peter Cornillon for his thoughtful suggestions, which we think have significantly improved the quality of this work.
Financial support
This work has been supported by the Copernicus Marine Service through contract no. 21001L03-COP-TAC SST-2300 via the Provision of Sea Surface Temperature – Thematic Assembly Center (SST-TAC) Observation Products (SST-TAC) project.
Review statement
This paper was edited by Julien Brajard and reviewed by Anastase Alexandre Charantonis, Peter Cornillon, Haoyu Wang, and one anonymous referee.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Sea surface temperature (SST) is one of the essential variables of the Earth's climate system. Being at the air–sea interface, SST modulates heat fluxes in and out of the ocean, provides insight into several upper and interior ocean dynamical processes, and is a fundamental indicator of climate variability potentially impacting the health of marine ecosystems. Its accurate estimation and regular monitoring from space is therefore crucial. However, even if satellite infrared/microwave measurements provide much better coverage than what is achievable from in situ platforms, they cannot sense the sea surface under cloudy and rainy conditions. Large gaps are present even in merged multi-sensor satellite products, and different statistical strategies, mostly based on optimal interpolation (OI) algorithms, have thus been proposed to obtain gap-free (L4) images. These techniques, however, filter out the signals below the space–time decorrelation scales considered, significantly smoothing most of the small mesoscale and submesoscale features. Here, deep learning models, originally designed for single-image super resolution (SR), are applied to enhance the effective resolution of SST products and the accuracy of SST gradients. SR schemes include a set of computer vision techniques leveraging convolutional neural networks to retrieve high-resolution data from low-resolution images. A dilated convolutional multi-scale learning network, which includes an adaptive residual strategy and implements a channel attention mechanism, is used to reconstruct features in SST data at
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 Consiglio Nazionale delle Ricerche, Istituto di Scienze Marine (CNR-ISMAR), Calata Porta di Massa, 80133 Naples, Italy
2 Consiglio Nazionale delle Ricerche, Istituto di Scienze Marine (CNR-ISMAR), Via del Fosso del Cavaliere 100, 00133 Rome, Italy