Out-of-the-box calving-front detection method

Full text

Turn on search term navigation

1 Introduction

Unlike the large majority of land-terminating glaciers, marine and lake-terminating (MALT) glaciers reach a water body at a low elevation. The contact surface is often referred to as the calving front. Their ice is lost by sub-marine melting or calving, i.e., ice that breaks off, gets de-connected, and starts to float freely in the form of icebergs or ice floes. Both processes, sub-marine melting and calving, determine the total frontal ablation, i.e., the mass loss at the calving front. Frontal ablation is often a dominant factor in the total mass budget of MALT glaciers . Besides its importance in the total glacier mass balance, the representation of processes controlling frontal ablation is currently a pressing task for numerical glacier models . Neglecting frontal ablation can introduce an important bias. analyzed the impact on ice thickness reconstruction (based on mass conservation) in Alaska and reported an underestimation of 19 % on regional scales and up to 30 % on glacier scales. Various successful approaches exist to parameterize frontal ablation for individual glaciers. Still, the implementation in large-scale or global models is limited by the amount and quality of measurements for constraining the models . Thus, large-scale measurements (ideally time series) of frontal ablation are demanded by the modeling community . Driving forces of frontal ablation are, on the one hand, ice flux (higher flux, e.g., due to bed lubrication by meltwater, can trigger calving events) and, on the other hand, marine factors such as ocean temperature, fjord bathymetry, and sea ice or ice mélange conditions . For example, the persistence of the ice mélange in front of the glacier can stabilize the calving front and affect the glacier dynamics. In contrast, the breakup of the ice mélange can lead to increased calving and ice flow at the glacier terminus . Moreover, a significant frontal retreat can also indicate a retreat of the grounding zone . The retreat of the grounding zone of a glacier with retrograde bedrock formation will lead to further grounding-zone retreat, resulting in increased ice loss and destabilization of the glacier or ice stream . Thus, information on the temporal variability of the calving-front position provides fundamental information on the state of the glacier or ice stream. Therefore, the glacier area has been defined as an essential climate variable (ECV) product by the World Meteorological Organization (WMO). Calving-front positions were usually manually mapped using different remote sensing imagery . Only a few studies applied automatic or semi-automatic approaches. In polar regions, the ocean downstream of the glaciers is often covered by sea ice and calved-off icebergs, forming the so-called ice mélange, making calving-front delineation a challenging task, even when captured by hand. Deep learning approaches have shown high potential for carrying out such complex segmentation tasks, e.g., on medical imagery . In recent years, the application of deep learning techniques for glacier front detection started . Calving fronts can be located in both optical and synthetic aperture radar (SAR) imagery. In optical imagery, calving fronts are more easily distinguishable, whereas synthetic aperture radar imagery has a higher scene availability as this is independent of daytime, season, and cloud coverage . A direct comparison between the results of existing deep-learning-based calving-front extraction studies is not possible as the models have been trained on different data, tested on different test sets, and evaluated using slightly differing metrics.

The benchmark dataset published by provides 681 SAR images of calving fronts. SAR imagery is independent of sunlight and cloud coverage, enabling continuous temporal coverage of the observation area, but compared to optical data, it has only one channel and has more speckle noise. For every SAR image, two labels are provided. One label provides four classes: ocean, glacier, rock, and no information available (e.g., radar shadow and layover areas, areas outside the swath). The other label marks the calving front with a one-pixel-wide line. Based on the training set of the dataset, train a modified U-Net for each label. One U-Net solves the task of glacier segmentation, and one detects the calving front. On the two test glaciers of the dataset, the segmentation model achieves an IoU (Intersection over Union) of $67.7 \pm 0.6$ . By taking the boundary between ocean and glacier, they extract a prediction of the calving front from the zone prediction with a mean distance error (MDE) of $753 \pm 76$ m. The model trained directly on the front label achieved an MDE of $887 \pm 189$ m.

In this work, we present a method that utilizes both labels of the dataset instead of training separate models for each task (see Fig. ). Multi-task learning (MTL) is a technique for machine learning algorithms that uses one model to tackle multiple tasks. In most cases, the potentially larger dataset and higher information content lead to higher performance for the individual tasks .

Our method is based on the nnU-Net (no new U-Net) proposed by , which is an out-of-the-box framework for training the U-Net. The framework contains proven deep learning techniques and established hyperparameter values, as well as rule-based parameters that depend on the properties of the dataset and available GPU memory. Therefore, the nnU-Net is a powerful tool that simplifies the application of deep learning algorithms. Its performance in the segmentation of SAR data has not been tested, and the availability of two labels suggests the modification of the nnU-Net for MTL. As a baseline for our evaluation of the nnU-Net, we use the results of .

In particular, our contributions are as follows: (1) visualizations showing the temporal and spatial distribution of the dataset by ; (2) application and evaluation of the nnU-Net for calving-front detection and zone segmentation; (3) two different modifications of the original nnU-Net to incorporate both labels for multi-task learning; (4) testing if an artificial third label improves the calving-front detection; (5) introduction of an efficient approach that fuses the two labels of the dataset; and (6) analysis of the influence of season, glacier, and satellite on the performance of the model.

The paper is organized as follows: after presenting the related work in Sect. , we give an overview of the dataset in Sect. . Section explains the method and our six experimental setups. Section examines the results and analyses the influence of different properties of the satellite images. Section summarises the work.

Figure 1

Illustration of the modified nnU-Net for simultaneous prediction of landscape zones and the glacier front. On the left is an exemplary satellite image of the Crane Glacier taken by TDX on 24 June 2011. On the right, the upper image shows the four segmentation classes: ocean (white), glacier (light gray), rock (dark gray), and no information available (black). The lower image shows the glacier's calving front as a white line versus the black background.

[Figure omitted. See PDF]

2 Related work

Automated monitoring of glacier-covered areas is a growing research field. Recent glacier monitoring uses deep learning methods due to the increasing availability of satellite images and computing power. Many methods are based on the U-Net . They modify the vanilla U-Net for better performance in calving-front detection . One approach is to segment the images into different areas and extract the calving front as the border between segmentation areas . Another approach directly trains a model on the position of the calving front . This task suffers from severe class imbalance due to the thin calving front. Researchers approach this problem by creating a distance map from every pixel to the front line. The network is trained on the distance map instead of the thin front line. The actual front-line prediction is then extracted during post-processing.

Other works use the segmentation network DeepLabv3 to detect calving fronts. The main advantage of DeepLabv3 over U-Net is the atrous spatial pyramid pooling, which makes the network adaptable to different image resolutions .

The Calving Front Machine (CALFIN) proposed by segments optical and SAR images into the ocean–land zones and extracts the calving front during post-processing with a topography map of the area. Researchers apply MTL with a late-branching architecture. They use two labels: a binary ocean mask and a binary calving-front mask. They achieve state-of-the-art predictions with an 86 m deviation from the measured calving front. A detailed comparison to the aforementioned U-Net-based methods by and revealed the generalization ability of CALFIN to other glacier datasets. The images had to be down-sampled for CALFIN. Therefore, the distance in meters doubles, but comparing the pixel distance errors reveals a similar performance. With 29 million parameters, CALFIN is still a large network that needs a large amount of training data. propose a similar approach for medical image segmentation and show that the individual tasks benefit from MTL. Several other approaches have advanced the U-Net architecture for MTL for medical applications . The work of uses MTL for the segmentation of a binary ocean–land mask and for edge detection of the Antarctic coastline, where the calving front is just part of the coastline. They add task-specific heads for the two tasks to the U-Net. They achieve results with a deviation from the reference of 345 m compared to 483 m with the vanilla U-Net. To avoid distortions of the metric from areas far from the coast, the metric is calculated within 2 km of the true coastline. A further development of the holistically nested edge detection (HED)-UNet for Antarctic ice shelf front detection is proposed by . Their post-processing includes an elevation threshold of 110 m to remove erroneous classifications in the high-altitude dry-snow zones.

Figure 2

Location of the seven benchmark glaciers: Columbia Glacier in Alaska, Crane, Mapple, Jorum, Dinsmoore–Bombardier–Edgeworth (DBE), Sjogren Glacier in the Antarctic Peninsula, and Jakobshavn Isbrae Glacier in Greenland. The background images are taken from the Google Maps Satellite imagery layer © Google 2019.

[Figure omitted. See PDF]

3 Dataset

The dataset used in this work is provided by . It contains 681 synthetic aperture radar (SAR) images of seven marine-terminating glaciers taken by six different satellites. Two glaciers are located in the Northern Hemisphere, namely Columbia Glacier in Alaska and Jacobshavn Isbrae (Sermeq Kujalleq) (JAK) in Greenland. The five glaciers in the Southern Hemisphere are all located on the Antarctic Peninsula (see Fig. ). The Crane, Mapple, and Jorum glaciers are closest to the south pole, followed by Dinsmoore–Bombardier–Edgeworth (DBE). They are located so close together that they are treated as one glacier site. The last glacier is the Sjögren-Inlet Glacier.

Table 1

Properties of the dataset, including a list of captured glaciers, train–test split, number of images per glacier, and covered area.

	Alaska	Antarctic Peninsula					Greenland
	Columbia	Mapple	Crane	Jorum	DBE	Sjögren-Inlet	Jakobshavn
Split	– test: 122 –		– train: 559 –
No. of images	65	57	69	77	133	121	159
Area [km]	$32 \times 15$	$8 \times 8$	$19 \times 25$	$20 \times 13$	$22 \times 20$	$23 \times 19$	$16 \times 19$

Table lists the seven glacier sites with the number of images and the area they show. The table also depicts the train–test split. The samples of the training set are used for the parameter optimization of the segmentation method, and the test set is exclusively used for evaluating the method. The test set contains the glaciers Mapple and Columbia. Columbia is the only mountain glacier in the dataset. The other glaciers are outlet glaciers of polar ice sheets. Therefore, Columbia can be seen as a benchmark glacier for the generalization capability of the segmentation method. Columbia and Mapple also differ in terms of the shape of the glacier. The images of Columbia show multiple calving fronts in one image, and the flow of the glacier arms goes in different directions. On the other hand, Mapple is a less complex glacier, moving in only one direction with one clearly defined calving front between the lateral fjord side walls.

Figure 3

Sample images of every glacier in the training set and their corresponding labels. The first row shows the front label with a black background and a one-pixel-wide white line representing the calving front. The second row contains the zone labels with four classes: ocean (white), glacier (light gray), rock (dark gray), and no information available (black). Black dots in the ocean zone of Crane are areas with no data available due to a small artifact at the former calving front of Crane Glacier in the digital elevation model used for the orthorectification of the SAR data. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

The dataset contains two labels and a bounding box of the glacier for each SAR image. One shows a mask of the different zones of the glacier (ocean, glacier, no information available, rock). The other label contains a one-pixel-wide line representing the calving front. The bounding box is not utilized for our method. A sample of each glacier in the training set with its corresponding labels is shown in Fig. . Predicting the zone mask can be seen as a classic segmentation problem. The calving front can also be extracted from the zone label by taking the border between ocean and ice areas in the corresponding bounding box. The direct delineation of the calving front is a more difficult task due to the high-class imbalance. Fewer than 1 % of the pixels are labeled as front pixels. Additionally, the structure of the class region is not a convex hull but a thin line.

Corresponding to the change in glacier area, the position of the calving front changes. In Fig. , the retreat of Columbia is visualized by plotting the calving-front position of every sample in a different color. The bright lines represent past calving fronts, and the dark-red lines represent the more recent positions. The constant retreat of Columbia is visible through the tiered lines. The visualization of the glaciers in the Southern Hemisphere is included in Appendix . The change in the class distribution of the zone labels over time is shown in Appendix .

Every glacier is captured by multiple satellites for a higher temporal resolution and extended observation periods, meaning that recordings of one glacier are captured by different SAR systems with different image resolutions. The resolution refers to the ground range resolution. The Environmental Satellite (ENVISAT), European Remote Sensing Satellite 2 (ERS-2), and Sentinel-1 (S1) have a resolution of 20 m, the phased-array-type L-band SAR (PALSAR) has a 17 m resolution, and TanDEM-X (TDX) has a 7 m per pixel resolution. Unfortunately, the dataset does not provide information about the polarization. In Fig. , a timeline of the images of each glacier visualizes the observation time and frequency of the images. The first two rows show the glaciers of the test set.

Figure 4

Evolution of the calving-front position of Columbia Glacier (Alaska). The red lines are all calving-front labels available in the dataset. Background: SAR intensity image acquired by TDX on 13 November 2011. Projection CRS EPSG:4326 – WGS 84/UTM zone 6N. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure 5

Distribution of the dataset's images over time. The samples are grouped by the seven glaciers and colored according to the capturing satellite. We abbreviated the following glaciers: Columbia (COL), Dinsmoore–Bombardier–Edgeworth (DBE), Sjögren-Inlet (SI), and Jakobshavn Isbrae (JAK).

[Figure omitted. See PDF]

4 Method

In this section, we explain our method, which includes the utilization of the nnU-Net as a framework that simplifies the training of a U-Net. We document our six experimental setups that aim to evaluate the impact of multi-task learning (MTL) on the training of the U-Net.

4.1 Background

In the field of deep learning, a lot of time and effort is put into the hyperparameter search. proposes the nnU-Net that automates the manual tuning of hyperparameters. The nnU-Net is a framework around the U-Net architecture. It provides good default values for hyperparameters, rules to adapt hyperparameters to the dataset, and many established deep learning techniques for training and inference. Most of the rule-based hyperparameters are only relevant for three-dimensional (3D) data in the medical domain, like computer tomography (CT) and magnetic resonance imaging (MRI), and are irrelevant to our experiments. For all image modalities except CT, $z$ -score normalization is applied. Each image is normalized independently by subtracting its mean and dividing by its standard deviation. Another dataset parameter is the median shape of the samples. The shape and the available graphics processing unit (GPU) memory determine the patch and batch size and, therefore, the network topology. The patch size is initialized with the median shape and iteratively reduced while adapting the network topology accordingly until the network can be trained with a batch size of at least two given GPU memory constraints.

In addition to the rule-based parameters, there are fixed parameters. These parameters are based on the authors' experience and generalize well across various tasks. The nnU-Net uses a poly learning rate scheduler, a combination of dice coefficient and cross-entropy as the loss function, and stochastic gradient descent (SGD) with a Nesterov momentum as the optimizer. The dice coefficient also helps with the problem of class imbalance. The coefficient of each class is weighted by the number of pixels in the label that relate to the class. Additionally, deep supervision is used to avoid vanishing gradients in neural networks with many layers. Independent of the dataset size, one epoch is defined as 250 mini-batches with foreground oversampling. The foreground oversampling is especially helpful for our application because the class imbalance between the calving front and background is high. It ensures that (at least) one-third of the patches for training are guaranteed to contain a foreground class. In our case, every batch contains two patches, from which one is forced to contain calving-front pixels.

This framework achieves robust results for various medical image segmentation tasks. Overall, nnU-Net sets a new state of the art in 33 out of 53 segmentation tasks of the Kidney and Kidney Tumor Segmentation challenge and otherwise shows performances that are on par with or close to the top leader board entries. But the performance of the nnU-Net in segmenting glacier SAR images is yet to be tested.

Figure 6

Illustration of the six experiments: two single-task-learning (STL) experiments (blue), two multi-task-learning (MTL) experiments with different architectures (green), one MTL experiment with an additional task of delineating the whole glacier boundary (yellow), and one experiment where the two labels are fused into one segmentation task with an additional class in the zone label (orange).

[Figure omitted. See PDF]

4.2 Multi-task learning with nnU-Net

The first research goal is to apply the nnU-Net out of the box to the glacier front detection and to the glacier zone segmentation. Training the nnU-Net directly on the front labels is the most straightforward approach for calving-front detection. The nnU-Net is intended to be used in the single-task-learning (STL) manner. In Fig. , these two baseline experiments are represented by the two blue columns on the left. The label of the calving front is dilated to the width of five pixels. Our preliminary experiments have shown that dilation makes the predictions more robust. For the training with zone labels, the post-processing includes extracting the boundary between the ocean and the glacier.

In the following experiments, the segmentation problem changes to a multi-task problem, where both labels are used to train one model. The next two experiments concern network architecture. They are represented by the two green columns in Fig. . The early-branching architecture uses one decoder for every label. Thus, the number of parameters increases by about 50 %. In contrast, the late-branching architecture requires only a small change to the vanilla U-Net. An additional channel of the last layer is used to predict the second label. For this architecture, only the weights of one kernel are trained in addition to the set of parameters needed for one task. The change in the total number of parameters that need to be trained is negligible.

Figure 7

Illustration of early- and late-branching U-Net architectures for multi-task learning (MTL). Both architectures perform joint feature extraction. The early-branching architecture applies separate reconstruction with two decoders, one for each task. The late-branching network applies joint reconstruction with a common decoder.

[Figure omitted. See PDF]

Because architecture changes with multi-task learning were not foreseen in nnU-Net framework, we had to make changes to the framework. During the experiment's planning, i.e., the pre-processing phase, we fixed the network architecture's estimated size to be the size of the early-branching network so that late- and early-branching networks return the same value for the network size. Otherwise, early- and late-branching networks would be trained on different patch sizes. Thus, performance differences during the evaluation might arise from the different patch sizes and not the differing architectures. During training, the error of all labels is calculated and summed up with equal weighting. For the inference, we adapted only the number of channels in the case of MTL. After, the test samples are divided into patches and fed through the network, and the patch predictions are combined into the prediction of the whole image. The predictions of the zones are post-processed to get an additional result for the position of the calving front. All glacier pixels with a neighboring pixel classified as the ocean are classified as glacier front to extract the glacier front from the zone predictions. We note that the prediction for the other task is omitted in the inference.

The last two experiments concern label changes. In Fig. , they are colored yellow and orange. The fifth experiment of this work (see Fig. , yellow), extracts the boundaries between the glacier zone and all other zones as a third segmentation task for the late-branching U-Net. The label of the glacier boundaries was extracted from the zone label. All glacier pixels with a neighboring rock or shadow pixel are classified as glacier boundaries. The hypothesis is that providing more information about the same sample benefits the performance of the U-Net in the individual tasks. The third segmentation task is not considered in the final evaluation. The last experiment fuses the zone and front labels by creating a fourth class in the zone label associated with the glacier front: see Fig. (orange). As the front line has a width of five pixels (35–100 m depending on the image resolution), the other zone classes are merely impaired. A post-processing step is added to the predictions. The front pixels are isolated to generate a front prediction for comparing the results with the other experiments. To generate a prediction of the zones, pixels classified as front are assigned to the ocean, and the glacier zone is dilated once with a $7 \times 7$ kernel.

4.3 Experimental setup

The nnU-Net was trained on the NVIDIA RTX 3080 with 12 GB memory; the adapted network architecture has nine encoder blocks and eight decoder blocks. Each block consists of two convolutional layers: an instance normalization and a rectified linear unit (ReLU). The kernels of all convolutional layers have a size of $3 \times 3$ . During training, one batch contains two images. The patch size of the experiments that include only one label is $1280 \times 1024$ . The experiments that have more than one label have a patch size of $1024 \times 896$ because of the GPU memory limit. There are also fixed parameters that are independent of the dataset. This includes the SGD optimizer with an initial learning rate of 0.01, a Nesterov momentum of 0.99, and a weight decay of $3 \times 10^{- 5}$ . Training of one epoch took between 100 and 160 s. The maximum number of epochs of 500 is reached in every training instance (due to limited resources, we reduced the original maximum number of epochs from 1000 to 500). The nnU-Net defines one epoch using a fixed number of iterations (250). In each iteration, the batch is sampled depending on the class distribution of the sample to counteract the class imbalance.

5 Evaluation

In this section, we will examine our evaluation metrics, compare the results of the six proposed experiments, and evaluate the results of the fused-label experiment. We used a 5-fold cross-validation for the evaluation to eliminate the weight initialization bias and the bias caused by a single split into training and validation sets. The metric scores of the individual models are averaged to get a robust measure independent of weight initialization and split.

5.1 Evaluation metrics

We use the mean distance error (MDE) proposed by as our main measure. It measures the distance of the predicted front $P$ to the ground-truth calving-front $Q$ for all images in the test set $I$ . For every pixel in the label front $Q$ , the distance to the closest pixel in the predicted front $P$ is determined. Additionally, to make the metric symmetric, the distance to the closest pixel in the label front $Q$ is determined for every pixel in the predicted front $P$ . These distances are averaged and taken as the mean distance between the two lines (see Fig. and Eq. ). We note that the front MDE is also calculated for the zone segmentation. We extract the front from the zone segmentation by classifying all glacier pixels with neighboring ocean pixels as the glacier front.

Figure 8

Visualization of the mean distance error (MDE) calculation between front label $Q$ and front prediction $P$ . The $dist (q, P)$ represents the distance between $q$ and the pixel in $P$ that is closest to $q$ .

[Figure omitted. See PDF]

1 $\begin{aligned} MDE (I) & = \frac{1}{\sum_{(P, Q) \in I} (| P | + | Q |)} \sum_{(P, Q) \in I} \\ \cdot (\sum_{p \in P} min_{q \in Q} | | p - q | |_{2} + \sum_{q \in Q} min_{p \in P} | | p - q | |_{2}) \end{aligned}$ Additionally, we give classical segmentation metrics to evaluate the zone prediction. They include the Intersection over Union (IoU), which is defined as $IoU = \frac{T_{P}}{T_{P} + F_{P} + F_{N}}$ , i.e., the true-positive ( $T_{P}$ ) pixels over the sum of $T_{P}$ , false-positive ( $F_{P}$ ), and false-negative ( $F_{N}$ ) pixels. Additionally, the F1 score is computed ( $F1 = 2 \cdot \frac{pr \cdot re}{pr + re}$ ), which is a combination of recall ( $re = \frac{T_{P}}{T_{P} + F_{N}}$ ) and precision ( $pr = \frac{T_{P}}{T_{P} + F_{P}}$ ). Because this segmentation task is not binary – rather, every sample has four classes – the metrics are calculated for each class individually and then averaged with equal weighting.

5.2 Results and discussion

In this section, we evaluate our experiments. In Fig. , we compare the performance of the six experiments described in Fig. . In Fig. , we analyze the prediction of the fused-label experiment and investigate the influence of season, glacier site, and satellite on the calving-front prediction performance.

5.2.1 Comparison of the five experiments

In Fig. , the MDE of every experiment is compared. The STL approach that is trained on the front labels has an MDE of $1103 \pm 72$ m, and the STL approach that is trained on the zone labels has an MDE of $1184 \pm 225$ m. A difference between the STL experiments arises in the performance variance, where the model trained on the zone labels is larger. The number of samples that are incorrectly predicted to have no front pixel is similar, with an average of $27.2 \pm 6.0$ for training on the front and $24.8 \pm 12.4$ out of $122$ for training with zone labels.

Figure 9

Boxplot of the mean distance error (MDE) of the six experiments. Two columns are colored equally for the multi-task-learning (MTL) experiments. The single-task-learning (STL) experiments are colored in dark and bright blue. Panel (a) represents the front delineation based on the front segmentation (task 1), and panel (b) represents the front delineation based on the zone segmentation (task 2). The baseline of is displayed as a gray line in each half.

[Figure omitted. See PDF]

The lowest number of samples with false non-front detections achieves the model that is trained on the fused labels ( $3.2 \pm 2.0$ for the front label out of $122$ samples). The baseline of is $1 \pm 1$ based on the zone segmentation and $7 \pm 3$ from the front prediction. All values can be seen in Table . All MTL models have a significantly lower MDE than STL models, with a significance level of $α = 0.01$ using Student's $t$ test. The table with the $T$ values of all experiment pairs is given in Tables and . The model that is additionally trained on the glacier boundary has the smallest MDE for both tasks. The MDE baseline of is $887 \pm 189$ m for the front prediction and $753 \pm 76$ m for the zone. Overall, the MDE is similar for all MTL approaches. Student's $t$ test shows that the differences between the fused label and all other MTL approaches are insignificant ( $α = 0.33$ ).

The metrics for the zone segmentation, shown in Table , show a similar trend: an improvement in all MTL approaches compared to the STL approach and minor changes between the MTL approaches. The STL of the nnU-Net on the zone label achieves $62.4 \pm 3.5$ IoU and an F1 score of $71.7 \pm 3.2$ . The highest F1 score achieves the model that is trained with the additional boundary task with $81.7 \pm 0.5$ . The model's IoU is $72.6 \pm 0.4$ . The baseline of is an F1 score of $80.1 \pm 0.5$ and $69.7 \pm 0.6$ IoU.

The fused-label approach is the most feasible method, discussed as follows. This approach's performance is on par with the other experiments. Even though the MDE of the boundary experiment is slightly lower than for the fused-label approach, Student's $t$ test showed that the difference is not significant. Moreover, the fused-label approach requires fewer parameters and no changes to the nnU-Net framework, making its use truly out of the box. Additionally, this approach has a relatively small number ( $5 \in 122$ ) of predictions with falsely non-detected fronts.

5.2.2 Analysis of the fused-label experiment

For the final evaluation, the zone segmentations of the five models that are trained during the 5-fold cross-validation on the fused labels are combined into a more robust ensemble prediction. The models differ by their weight initialization and train–validation split. However, they are all trained on the fused labels. The ensemble prediction is created by taking the mean of the probabilities for every class each model gives. We do not use the calving-front pixels directly from the prediction but apply the post-processing of the zone label to gain a slightly better MDE. The post-processing labels the edge of the glacier zone as a calving front that has neighboring ocean pixels. The performance of the ensemble prediction results in an MDE of $515 \pm 39$ m and $5 \in 122$ predictions with no fronts, which is similar to the average performance of the single models. We did not use the ensemble prediction for the comparison between the experiments in Fig. to show the variance that is introduced by a different train–validation split and different weight initialization.

Figure 10

Mean distance error (MDE) of the test set grouped by glaciers and subdivided into seasons. The orange boxplot in the first row shows all MDEs of the test set. The olive-green plot below shows the errors from the Columbia images, subdivided into seasons in the third and fourth rows. The last three rows show errors from the Mapple images. The dark-gray line shows the baseline of the zone prediction, and the light-gray line shows the baseline of the front prediction from . The ensemble model was trained with the fused zone and front labels. The $y$ axis has a logarithmic scale. The median is the middle line in the rectangle, and the dashed line represents the mean. The $x$ axis has a logarithmic scale. Otherwise, the outliers would dominate the plot. The rectangle reaches from the first quartile to the third quartile.

[Figure omitted. See PDF]

The distribution of the MDEs in the test set predictions is plotted in Fig. . In the first row, all errors of the prediction of the $117$ test samples are drawn as dots. The test set contains two glaciers: Mapple and Columbia. The rows below show the distribution of the MDEs in the two test glaciers separated into summer and winter seasons. The main difference in MDE is between glaciers, similarly to the baseline results of , with MDEs of $287 \pm 48$ m for Mapple and $840 \pm 48$ m for Columbia. The MDE of our methods is, on average, $109 \pm 90$ m for Mapple, while the MDE of Columbia is $930 \pm 1420$ m. The difference in the MDEs is caused by a group of predictions with an error $> 1000$ m. The median value is 222 m for Columbia and 80 m for Mapple. The reasons for the large glacier differences might be related to the different shapes. Mapple has a simple calving front, a single line constraint by a straight valley. Columbia has multiple calving fronts, a stream coming from the left side of the image, one from the top, and another from the left (see Fig. ).

Figure 11

Mean distance error (MDE) of the test set grouped by the resolution of the synthetic aperture radar (SAR) image. The orange boxplot in the first row shows all MDEs of the test set. ENVISAT, ERS-2, and Sentinel-1 have a resolution of 20 m per pixel. MDEs of this resolution are represented by the dark-red boxplot. PALSAR has a 17 m per pixel resolution. MDEs of images of this resolution are represented in the magenta boxplot. TDX has a 7 m per pixel resolution. MDEs of images of this resolution are represented in the green boxplot. The ensemble model was trained with the fused zone and front labels. The dark-gray line shows the baseline of the zone prediction, and the light-gray line shows the baseline of the front prediction from . The $y$ axis has a logarithmic scale. The median is the middle line in the rectangle, and the dashed line represents the mean. The $x$ axis has a logarithmic scale. Otherwise, the outliers would dominate the plot. The rectangle reaches from the first quartile to the third quartile.

[Figure omitted. See PDF]

Figure 12

Mean distance error (MDE) of the test set grouped by satellites. The dark-gray line shows the baseline of the zone prediction, and the light-gray line shows the baseline of the front prediction from . The images from ERS-2 (light blue), ENVISAT (yellow), and PALSAR (green) only capture the Mapple Glacier. The MDEs from TDX (pink) and Sentinel-1 (turquoise) are subdivided into the two glaciers of the test set. The ensemble model was trained with the fusion of zone and front labels. The $y$ axis has a logarithmic scale. The median is the middle line in the rectangle, and the dashed line represents the mean. The $x$ axis has a logarithmic scale. Otherwise, the outliers would dominate the plot. The rectangle reaches from the first quartile to the third quartile.

[Figure omitted. See PDF]

Figure 13

Calving front predictions of Columbia on (a) 11 February 2016 and (b) 22 June 2014 taken by TDX with 7 m pixel resolution (after multi-looking with a factor of $3 \times 3$ ); (c) was taken by Sentinel-1 on 10 September 2019 with 20 m pixel resolution. Note the ground truth (blue), the prediction (yellow), and the overlap of ground truth and prediction (magenta). SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure 14

Calving-front predictions of Mapple Glacier. Note the ground truth (blue), the prediction (yellow), the overlap of ground truth and prediction (magenta), and the bounding box (cyan). Panel (a) was taken by TDX on 13 October 2008. Panel (b) was taken by PALSAR on 29 November 2008. Panel (c) was taken by Sentinel-1 on 3 July 2016. Panel (d) was taken by Sentinel-1 on 2 March 2019. Panel (e) was taken by ERS on 5 February 2007. Panel (f) was taken by ENVISAT on 22 September 2007. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

There is also a seasonal difference in the MDE. The MDEs of the front prediction during the summer of Mapple and Columbia combined have lower values ( $307 \pm 730$ m) than the samples captured during the winter months ( $818 \pm 1389$ m). However, the medians are closer together, with 150 m in the summer months and 181 m in the winter months. The MDE baseline in summer is $732 \pm 93$ and $776 \pm 65$ m in winter, although most outliers are from winter seasons in Columbia. In winter, snow coverage of the glacier is more likely. The SAR signal can penetrate through several meters of snow cover, depending on the SAR frequency and the water content of the snow cover. However, most of the studied glaciers, in particular Columbia Glacier and the glaciers of the Antarctic Peninsula, are located in temperate marine environments, making wet-snow conditions also likely during winters. Thus, snow can cover useful artifacts like crevasses and rock structures that can be useful for the pattern recognition of the nnU-Net. However, more important is that the ocean next to the glaciers is covered more often by ice mélange during winters, reducing the contrast between ocean and glacier areas. Even for experienced human mappers, it can be a challenging task to distinguish between the glacier tongue and the ice mélange. Thus, we hypothesize that the network has similar issues, leading to reduced performance during winter. The difference between seasonal MDE also depends on the sensor. The MDE of the low-resolution sensors (Sentinel-1 , ENVISAT) is lower in the summer months. The MDE of the high-resolution sensor (TDX) is lower in winter months, but the difference is much smaller. A table of the MDE grouped by satellite and season is in Appendix .

An overview of the impact of the sensor resolution is given in Fig. . The images with a resolution of 7 m per pixel have a mean MDE of 180 m. The images with 17 m resolution have a mean MDE of 135 m, but this class contains only images of Mapple Glacier taken by PALSAR. The images with 20 m per pixel have a mean MDE of 1214 m. The distribution has a cluster of MDEs that are larger than 1000 m. These large errors only stem from images of Columbia (see Fig. , row Columbia_S1).

In Fig. , the MDE is grouped by satellites. The predictions for the samples captured by ERS, ENVISAT, and PALSAR have a similar average error between 150 and 300 m. However, the ERS, ENVISAT, and PALSAR samples only represent the Mapple Glacier. TDX captures both test sites and has an MDE of $180 \pm 179$ m. The samples captured by Sentinel-1 have a higher MDE with $1664 \pm 1807$ m. The outliers are all from samples of Columbia captured by Sentinel-1. The neural network can not generalize from the training set to this set of samples. The $14$ outliers have a front-line delineation error of $> 1000$ m. These are predictions from images of Columbia taken by Sentinel-1. The low-resolution artifacts, e.g., cracks in the ice, are not visible. Figure c shows that the model falsely predicts ocean pixels as the front and does not predict the left calving front at all. The MDE for this sample is $4221 \pm 5026$ m. We suspect that the complex calving front of Columbia requires a high resolution for accurate detection or more training data from Sentinel-1. The group of outliers heavily increases the overall MDE but can be separated clearly by specifying the satellite and glacier. It shows the generalization limits of our method. For a visual inspection, we provide the predictions of the test set as single images and as an animation on Zenodo (10.5281/zenodo.8379954, ).

The MDE is negatively influenced by calving-front predictions extending over the labeled calving front on the coastline, which can be seen in Fig. a and b. The MDE might give a wrong implication with regard to the usability of these predictions. The prediction of Fig. b has a relatively high MDE (393 m) because it predicts the coastline as a calving front. Despite this, the prediction can still be used to monitor the change in the position of the calving front and to calculate the total frontal ablation rate or surface area. The prediction can be corrected easily if the coastline is known. The misclassification of the coastline as a calving front holds true for nearly all images of Columbia taken by TDX. In the northern part of Columbia, it is particularly difficult to distinguish between the coastline and calving front because the glacier retreated so far that it transitioned from a marine-terminating glacier to a land-terminating glacier. Without prior knowledge, it is even difficult for humans to distinguish between the coastline and the calving front.

The model predictions of the calving front in images of Mapple have a high overlap with the label, even with a low resolution of $\geq 20$ m (ERS) (see Fig. ). There are also some outliers for Mapple, but they are not as severe as the outliers for Columbia. The prediction of Mapple with a high resolution of 7 m per pixel is close to the ground truth (Fig. ).

6 Conclusions

This work explores the use of the nnU-Net by in segmenting glacier-calving fronts. The nnU-Net promises an out-of-the-box application of deep learning for segmentation tasks. We evaluate this claim using a dataset of glacier images provided by . The dataset contains two tasks: the calving-front detection and the glacier zone segmentation. We try different modifications of multi-task learning (MTL) with two different neural-network architectures to tackle both tasks simultaneously. The results show that combining both tasks increases each task's performance. No significant difference between the two MTL architectures exists. Adding more domain-specific tasks like glacier boundary delineation does not further improve the previous tasks. Due to the small area of the front line, the two labels can be fused into one label. The fusion of labels decreases the number of parameters used, shortens the training time, and reduces the deep learning expertise needed as the nnU-Net can be used without modifications. This approach achieves an average MDE of $541 \pm 84$ m. We provide the code and the pre-trained model for application on further SAR images of glacier fronts (see “Code and data availability” section). The predictions need to be filtered manually since there can be outliers, as our results show. However, this will provide an initial prediction, which eases the task of glacier front delineation.

To improve the average MDE, future work should focus on reducing the model performance with low-resolution images, such as with Sentinel-1 with 20 m per pixel resolution. This can be done by including more images taken by Sentinel-1 in the training set or by implementing an oversampling strategy of the low-resolution images.

The framework nnU-Net is well suited to segmenting SAR images of glaciers and calving-front delineation. The modification of the nnU-Net for MTL improves the results compared to STL experiments, where only one label is used. However, it is not necessary for glacier segmentation and calving-front delineation because both labels can be fused without losing a significant amount of information. Our findings highlight the suitability of the nnU-net for glacier front segmentation with multi-mission SAR remote sensing data, which will facilitate an efficient, extended spatiotemporal mapping of tidewater glacier terminus changes. Our findings also promote the out-of-the-box application of the nnU-Net for other segmentation tasks based on satellite imagery because we did not need to modify it for the calving-front detection.

Appendix A Evaluation results of all experiments

This section provides more detailed values of the evaluation. Tables and contain the $T$ values for significant and insignificant differences between models. Table contains the zone MDE and the front MDE of the six experiments. The MDE is averaged over the five model results. The baseline from is also provided in the first two rows. The values correspond to Fig. . We provide Fig. to compare the two methods (nnU-Net and ). Table contains the MDE of the fused-label training grouped by satellite and season.

Table A1

$T$ value for the significance of different front positions extracted from front predictions. $T$ values that surpass the threshold of 3.36 mean that the probability for the difference to be random is below 1 %. $T$ values below 1 mean that the difference is by chance, with a probability of 35 %, meaning that the difference is insignificant.

	Early	Late	Boundary	Fused
	MTL	MTL	MTL	labels
STL	15.78	13.46	16.14	14.29
Early MTL	0	0.31	1.02	0.43
Late MTL		0	0.95	0.58
Bound. MTL			0	0.46

Table A2

$T$ value for the significance of different front positions extracted from zone predictions. $T$ values that surpass the threshold of 3.36 mean that the probability for the difference to be random is below 1 %. $T$ values below 1 mean that the difference is by chance, with a probability of 35 %, meaning that the difference is insignificant.

	Early	Late	Boundary	Fused
MTL	MTL	MTL	labels
STL	6.57	6.51	6.66	5.96
Early MTL	0	0.29	0.03	0.75
Late MTL		0	0.35	0.58
Bound. MTL			0	0.81

Table A3

MDE of all six experiments. Every experiment setup is trained five times with different weight initializations and train–validation splits and is then averaged. The $\emptyset$ columns show the number of images for which no front was falsely detected out of the $122$ test samples.

Model	Experiment	Modality	MDE front $↓$ [m]	$\emptyset$ front	MDE zone $↓$ [m]	$\emptyset$ zone
	Front	STL	$887 \pm 189$	$7 \pm 3$	–	–
	Zone	STL	–	–	$753 \pm 76$	$1 \pm 1$
nnU-Net	Front	STL	$1102 \pm 72$	$27.2 \pm 6.0$	–	–
	Zone	STL	–	–	$1184 \pm 255$	$24.8 \pm 12.4$
	Early	MTL	$560 \pm 25$	$8.6 \pm 3.0$	$509 \pm 43$	$2.2 \pm 0.4$
	Late	MTL	$568 \pm 51$	$8.4 \pm 3.5$	$516 \pm 40$	$4.0 \pm 1.1$
	Boundary	MTL	$543 \pm 27$	$5.6 \pm 1.4$	$509 \pm 19$	$4.4 \pm 1.8$
	Fused	MTL	$552 \pm 33$	$3.2 \pm 2.0$	$541 \pm 84$	$3.4 \pm 1.3$

Table A4

Segmentation metrics of all six experiments. Every experiment setup is trained five times with different weight initializations and train–validation splits. The metric results of each run are then averaged.

Model	Experiment	Modality	Precision $↑$	Recall $↑$	F1 $↑$	IoU $↑$
	Zone	STL	$84.2 \pm 0.5$	$79.6 \pm 0.9$	$80.1 \pm 0.5$	$69.7 \pm 0.6$
nnU-Net	Front	STL	–	–	–	–
	Zone	STL	$81.2 \pm 2.4$	$71.7 \pm 3.2$	$71.9 \pm 3.7$	$62.4 \pm 3.5$
	Early	MTL	$87.4 \pm 0.4$	$80.9 \pm 0.7$	$81.6 \pm 0.6$	$72.6 \pm 0.9$
	Late	MTL	$86.4 \pm 0.4$	$79.9 \pm 0.6$	$80.7 \pm 0.7$	$71.1 \pm 0.7$
	Boundary	MTL	$87.5 \pm 0.3$	$80.8 \pm 0.4$	$81.7 \pm 0.5$	$72.6 \pm 0.4$
	Fused	MTL	$87.0 \pm 0.2$	$79.1 \pm 1.7$	$80.7 \pm 0.1$	$70.8 \pm 1.8$

Table A5

MDE of the fused-label experiments grouped by sensor and season.

Satellite	S1	ENVISAT	ERS	PALSAR	TDX	all
Winter	$2529 \pm 1719$ m	$241 \pm 101$ m	–	–	$160 \pm 125$ m	$818 \pm 1389$ m
Summer	$798 \pm 1442$ m	$122 \pm 61$ m	$83 \pm 39$ m	$135 \pm 68$ m	$197 \pm 213$ m	$307 \pm 730$ m

Table A6

Comparison of the U-Net training setup and hyperparameters of and the nnU-Net .

Hyperparameter		nnU-Net
Number of convolutional layers	10	34
Pooling	Max pooling	Strided convolution
Activation function	ReLU	Leaky ReLU
Patch size	$256 \times 256$	$1280 \times 1024$
Batch size	16	2
Optimizer	Cyclic learning rate scheduler	SGD with a Nesterov momentum of 0.99
	in combination with the Adam optimizer	and a weight decay of $3 \times 10^{- 5}$

Initial learning rate	$4 \times 10^{- 5}$	$1 \times 10^{- 3}$
Gradient clipping	True	False
Deep supervision	False	True
Loss function	Combination of dice and cross-entropy	Combination of dice and cross-entropy

Appendix B Calving-front labels

The other six glacier sites in this section are displayed complementarily to Fig. . The corresponding calving fronts of the different time steps are colored with a gradient from bright for the past to dark red for the most recent fronts.

Figure B1

Jakobshavn Isbrae Glacier from 1995 to 2014. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure B2

Mapple Glacier from 2006 to 2020. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure B3

Crane Glacier from 2002 to 2014. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure B4

Dinsmoore–Bombardier–Edgeworth glaciers from 1995 to 2014. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure B5

Jorum Glacier from 2003 to 2020. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Figure B6

Sjögren-Inlet Glacier from 1995 to 2014. SAR imagery was provided by DLR, ESA, and ASF.

[Figure omitted. See PDF]

Appendix C Temporal distribution of zone label

Figure shows the temporal distribution of the zone label. Jacobshavn and Columbia, the two glaciers in the Northern Hemisphere, show an increase in ocean area with a decreasing glacier area. In particular, the set of images of the glaciers on the Antarctic Peninsula has samples with large areas with no available information. Low partial coverage by the radar swath causes prominent peaks of the no-information-available class. Jacobshavn's area distribution shows a repetitive structure of increasing and decreasing glacier area. This pattern represents seasonal changes.

Figure C1

Changes of the class distributions in the zone labels of each glacier over time. The ocean is colored in blue, the glacier area is colored in white, the rock is colored in gray, and the area with no available information (NA) is colored in black. There is no radar reflection in the NA area due to terrain elevation causing shadows or due to limited coverage by the radar swath.

[Figure omitted. See PDF]

Code and data availability

The code is fundamentally based on the nnU-Net by , which is also publicly available (https://github.com/MIC-DKFZ/nnUNet/tree/nnunetv1, ). We modified the neural-network architecture and the corresponding pre- and post-processing. The modified version of the nnU-Net and our evaluation and visualization scripts are available on Zenodo (10.5281/zenodo.10168770, ), and a demo version is provided on Hugging Face (10.5281/zenodo.10169965, ). The calving-front predictions of the test set are available on Zenodo (10.5281/zenodo.8379954, ), as well as the pre-trained model (10.5281/zenodo.7837300, ). The dataset is provided by (10.1594/PANGAEA.940950).

Author contributions

OH and NG were responsible for conceptualizing the study. OH developed and implemented the methodology and software for the experiments. NG and TS provided the dataset and a benchmark score. NG and VC supervised the study and reviewed and edited the paper. JJF acquired financial support. OH created visualizations and prepared the paper with contributions from all the co-authors. JJF, VC, and NG reviewed and edited the paper.

Competing interests

At least one of the (co-)authors is a member of the editorial board of The Cryosphere. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

We acknowledge the free provision of the SAR data via various proposals from ESA, ASF, and DLR.

Financial support

This research has been supported by the European Research Council H2020 European Research Council (grant no. 948290), the Staedtler Stiftung (Emerging Field Initiative (EFI), TAPE), the Bayerisches Staatsministerium für Wissenschaft und Kunst (Measuring and Modelling Mountain Glaciers in a Changing Climate (IDP M $^{3}$ OCCA)), the Deutsche Forschungsgemeinschaft (National High-Performance Computing Center (NHR@FAU)), and the Universitätsbund Erlangen-Nürnberg (Open Access Publication Funding).

Review statement

This paper was edited by Stef Lhermitte and reviewed by two anonymous referees.

Word count: 8338

Show less

© 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Glaciers across the globe react to the changing climate. Monitoring the transformation of glaciers is essential for projecting their contribution to global mean sea level rise. The delineation of glacier-calving fronts is an important part of the satellite-based monitoring process. This work presents a calving-front extraction method based on the deep learning framework nnU-Net, which stands for no new U-Net. The framework automates the training of a popular neural network, called U-Net, designed for segmentation tasks. Our presented method marks the calving front in synthetic aperture radar (SAR) images of glaciers. The images are taken by six different sensor systems. A benchmark dataset for calving-front extraction is used for training and evaluation. The dataset contains two labels for each image. One label denotes a classic image segmentation into different zones (glacier, ocean, rock, and no information available). The other label marks the edge between the glacier and the ocean, i.e., the calving front. In this work, the nnU-Net is modified to predict both labels simultaneously. In the field of machine learning, the prediction of multiple labels is referred to as multi-task learning (MTL). The resulting predictions of both labels benefit from simultaneous optimization. For further testing of the capabilities of MTL, two different network architectures are compared, and an additional task, the segmentation of the glacier outline, is added to the training. In the end, we show that fusing the label of the calving front and the zone label is the most efficient way to optimize both tasks with no significant accuracy reduction compared to the MTL neural-network architectures. The automatic detection of the calving front with an nnU-Net trained on fused labels improves from the baseline mean distance error (MDE) of $753 \pm 76$ to $541 \pm 84$ m. The scripts for our experiments are published on GitHub (https://github.com/ho11laqe/nnUNet_calvingfront_detection, last access: 20 November 2023). An easy-access version is published on Hugging Face (https://huggingface.co/spaces/ho11laqe/nnUNet_calvingfront_detection, last access: 20 November 2023).

Details

Title

Out-of-the-box calving-front detection method using deep learning

Author

Herrmann, Oskar¹

; Gourmelon, Nora²

; Seehaus, Thorsten¹

; Maier, Andreas²; Fürst, Johannes J¹

; Braun, Matthias H¹

; Christlein, Vincent²

¹ Institute of Geography, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
² Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Pages

4957-4977

Publication year

2023

Publication date

2023

Publisher

Copernicus GmbH

ISSN

19940424

e-ISSN

19940416

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/tc-17-4957-2023

ProQuest document ID

2892891126

Out-of-the-box calving-front detection method using deep learning

Jump to:

Full text

Abstract

Details

Suggested sources