1. Introduction
The automatic extraction of road networks from remote sensing images plays a critical role in applications like urban planning and environmental monitoring [1]. Usually, the task aims to extract the road mask (Figure 1c) from an optical remote sensing image (Figure 1a). Recent developments in image segmentation have yielded impressive results, significantly aided by the availability of manually labeled datasets in challenges such as the DeepGlobe (DG) and SpaceNet 3 (SN3) [2,3]. However, these datasets and methods are designed specifically for the electro-optical (EO) domain and do not perform well for other remote sensing techniques like Synthetic Aperture Radar (SAR). SAR, while less prevalent than EO, offers unparalleled capabilities in remote sensing, such as all-weather and all-day operational capacity [4], which make it indispensable in scenarios like disaster responses [5,6]. However, the imaging techniques and visual characteristics of EO images and SAR images differ significantly, as shown in Figure 1a,b. This discrepancy explains why EO-based road segmentation methods cannot be directly applied to SAR images, highlighting the necessity for specialized approaches in SAR road segmentation. A major challenge in this area is the lack of publicly available high-resolution SAR datasets, which are far more expensive and difficult to annotate than their EO counterparts.
In this context, we introduce the HybridSAR Road Dataset (HSRD), designed to overcome the challenges of the lack of high-quality data in SAR road segmentation. This innovative dataset package includes the SpaceNet 6 Road (SN6R) dataset, the DG-SAR dataset, and the SN3-SAR dataset, as illustrated in Figure 2. The SN6R dataset is the first of its kind to offer sub-meter resolution real SAR images for road segmentation, utilizing road labels from OpenStreetMap (OSM) [8] and SAR images from the SpaceNet 6 Challenge [9], which covers the city of Rotterdam. However, the SN6R dataset’s limited geographical diversity can lead to overfitting, a common issue in remote sensing images [2]. To address this, we augmented the real SAR dataset with two synthesized datasets, DG-SAR and SN3-SAR, derived from existing optical datasets, DG and SN3. We employed a generative adversarial network (GAN) trained with the SN6-SAROPT dataset [7] to convert optical remote sensing images into pseudo-SAR images, significantly expanding the available training data. This method mitigates the scarcity of SAR data and addresses the domain gap challenges inherent in using optical images for SAR data augmentation. A detailed summary of the HSRD is presented in Table 1. We then adapted a state-of-the-art road segmentation network, originally developed for the optical domain, to the SAR domain through an enhanced training framework that integrates both real and synthetic data. While the road segmentation method itself is not the primary innovation, the effective integration of this method with our novel dataset construction significantly enhances its performance. Our experimental results demonstrate that the proposed HybridSAR Road Dataset and the utilized segmentation network achieve a performance comparable to those of state-of-the-art methods in the optical domain, demonstrating their efficacy in SAR road segmentation.
In summary, our work makes two significant contributions to the field of SAR road segmentation:
We present a comprehensive dataset package, the HybridSAR Road Dataset (HSRD). This is the first high-resolution road segmentation dataset in the SAR domain. It addresses the limitations of geographical diversity and the scarcity of labeled SAR data for road segmentation.
We utilize a road segmentation network, originally designed for the optical domain, to the SAR domain. We are able achieve notable segmentation results for SAR for the first time through an enhanced training framework that integrates both real and synthetic SAR data.
By bridging the gap between optical and SAR remote sensing domains for road segmentation, our work provides a robust solution that enhances the accuracy and reliability of SAR road segmentation. These contributions not only address current limitations but also pave the way for future advancements in remote sensing applications.
2. Related Works
2.1. Road Segmentation Methods
Automated road segmentation or extraction from overhead imagery has garnered significant interest, with the primary goal being the identification of road pixels within small image tiles. It can be consider a semantic segmentation task with two classes (road and non-road). Efforts in this domain vary, with some focusing on the extraction of entire road surfaces, which involves accounting for the road shape and width, while others concentrate on delineating the road centerline only.
Traditional road extraction techniques have utilized both pixel-based and object-based methods to distinguish road features, leveraging aspects such as spectrum, texture, and geometry [10]. Pixel-based approaches, exemplified in [11], concentrate on extracting detailed pixel-level features but often necessitate additional steps for noise reduction. Conversely, object-based strategies, demonstrated in [12,13,14], approach road identification by considering roads as holistic objects and incorporating spatial texture features. These methods enhance the accuracy and noise immunity of urban road segmentation, showing notable success in segmentation and the differentiation of similar objects.
In contrast to these traditional methods, deep learning (DL) techniques offer several advantages, including the ability to automatically learn hierarchical feature representations from data, which eliminates the need for manual feature selection and extraction. DL methods have demonstrated superior performance in handling complex scenes and variability in road appearance due to lighting, weather conditions, and occlusions. Commonly used DL architectures for image segmentation tasks have been effectively applied in road segmentation tasks, as shown in works like the DeepGlobe winning algorithm D-LinkNet [15] and the SpaceNet 3 winning algorithm DeepResUNet [16]. Inspired by the latter, CRESIv2 [17] was developed, which offers not only highly accurate city-scale road extraction but also speed estimation. However, all of these methods require substantial amounts of annotated data for training, which limits their application or adaptation in the SAR domain due to the lack of SAR data.
Recent advancements in deep learning have also seen the rise of transformer-based models, which have proven effective in various segmentation tasks. For example, in the domain of building extraction from remote sensing images, the use of Sparse Token Transformers (STTs) showed significant promise in [18]. This work explored the potential of using transformers for efficient building extraction, designing an efficient dual-pathway transformer structure that learns long-term dependencies of tokens in both spatial and channel dimensions. Their approach, which introduces a “sparse token sampler” to represent buildings as a set of sparse feature vectors, greatly reduces computational complexity while achieving state-of-the-art accuracy on benchmark building extraction datasets.
These innovations in building segmentation highlight the potential applicability of advanced transformer-based methods to road segmentation tasks. By leveraging the strengths of transformers in modeling long-range dependencies and efficiently handling sparse data representations, similar approaches can be adapted to improve road segmentation in SAR imagery. Incorporating such diverse methodologies provides a robust foundation for enhancing the performance of SAR road segmentation models, demonstrating significant improvements in both accuracy and efficiency.
2.2. Road Segmentation Datasets
Compared to SAR data, optical data have been more extensively utilized and studied, underpinning numerous advances in road segmentation. There exist several popular road datasets, such as DeepGlobe, SpaceNet 3, and the Massachusetts dataset, which have significantly propelled the aforementioned progress in this domain. These datasets offer comprehensive, manually labeled road data on a sub-meter resolution scale, serving as crucial resources for the development of advanced road segmentation models. Examples and a comparison of the three datasets are presented in Figure 3 and Table 2.
The DeepGlobe Dataset Introduced by the DeepGlobe Road Extraction Challenge, the DeepGlobe dataset (DG) [2] stands out as one of the inaugural large-scale datasets designed specifically for road segmentation from satellite imagery. Only the training set from the challenge is publicly available. Aimed at facilitating the automated extraction of road networks in disaster-affected and developing regions, it utilizes imagery sourced from DigitalGlobe to cover diverse landscapes across Thailand, Indonesia, and India. With a ground resolution of 50 cm/pixel across RGB channels, the dataset includes 6226 manually labeled image tiles of 1024 × 1024 pixels, complete with pixel-level segmentation masks, covering an area of 1632 km2 in Thailand.
The SpaceNet 3 Dataset The SpaceNet 3 dataset, emerging from the SpaceNet 3: Road Network Detection Challenge [3], represents another significant contribution to the field. Developed through a collaboration among CosmiQ Works, Radiant Solutions, and NVIDIA, this dataset aims to spur innovation in the application of computer vision and deep learning to satellite imagery. Spanning more than 8000 km of roads across four Areas of Interest (AoIs)—Vegas, Shanghai, Paris, and Khartoum—SpaceNet 3 introduces a broad geographic diversity to support the development of geographically generalizable algorithms. It presents labels meticulously crafted by the SpaceNet labeling team as geospatial vector files, alongside 1300 × 1300 pixel high-resolution image tiles that include both 8-Band Multi-Spectral and Panchromatic raster data from the WorldView-3 satellite, providing a 0.3 m/pixel resolution.
The Massachusetts Dataset The high costs and significant time required for manual labeling have prompted the use of open-source, lower-resolution optical images alongside OpenStreetMap (OSM) road centerlines as a cost-effective alternative [19]. The geo-referenced nature of Sentinel-1 and Sentinel-2 image patches enables the effective use of OSM as an alternative to human labeling efforts. This strategy has been employed in various works, including HD-maps [20], DeepResUNet [16], and RoadTracer [21], which leverage these resources for training models. Additionally, Google Maps has become a popular source for similar endeavors. Notably, the Massachusetts Dataset [22], among the most utilized road segmentation datasets in the optical domain, features Google Map imagery as tiles with OSM road layers serving as labels.
While SAR images, generated from radar signal reflections, offer distinct advantages by highlighting material properties and geometrical formations unique to radar wavelengths, their interpretation is more complex and costly compared to optical imagery. Despite significant advancements in SAR datasets for various applications, there is a notable lack of SAR road segmentation datasets. Existing SAR datasets, such as BigEarthNet [23], OpenSARUrban [24], and others, focus on different domains like scene classification or maritime object detection. To address this gap, we created SN6R, a novel dataset tailored for road segmentation in sub-meter resolution SAR imagery. We also made use of the optical datasets mentioned above, DG and SN3, adapting them to the SAR domain using an EO2SAR image translation model. This approach leverages existing EO-based datasets and methods to advance SAR-based road segmentation.
3. Construction of the HybridSAR Road Dataset
3.1. Motivation
As shown in Figure 2, the HybridSAR Road Dataset (HSRD) consists of three sub-datasets: the real SAR dataset SN6R, and the synthetic datasets SN3-SAR and DG-SAR. We will explain the construction of the SpaceNet 6 Road (SN6R) dataset in Section 3.2 and the construction of the SN3-SAR and DG-SAR in Section 3.3. An overview dataset creation process is visualized in Figure 4. The construction of the HybridSAR Road Dataset (HSRD) is driven by several key motivations, addressing the limitations and challenges in current road segmentation models and datasets:
The primary motivation for constructing the HybridSAR Road Dataset (HSRD) is to address the critical issue of data scarcity in high-resolution SAR road segmentation. The lack of available high-resolution SAR datasets with annotated road labels significantly hampers the development of robust and accurate segmentation models. On one hand, we have some unlabeled, geo-referenced SAR images; on the other, we have high-quality road labels for EO images. Thus, we think of employing a hybrid data construction method. This approach leverages both real SAR images with OpenStreetMap (OSM) road labels and synthetic SAR images generated from high-quality EO datasets.
Overcoming geographical and data diversity limitations is another crucial aspect of our motivation. Existing road segmentation models often suffer from overfitting when applied to new geographic regions, a challenge well documented in remote sensing research. This is largely due to the limited diversity in the available datasets, which are often confined to specific regions, resolutions, and imaging conditions. Additionally, inconsistencies in image tiling and labeling practices further exacerbate the overfitting problem, reducing the generalizability of these models. By integrating real SAR data (SN6R) with synthesized datasets (SN3-SAR and DG-SAR) derived from optical images, we create a more diverse and comprehensive training dataset. The use of a generative adversarial network (GAN) to convert optical images into pseudo-SAR images significantly expands the available training data, bridging the domain gap between EO and SAR imagery. This hybrid approach not only enhances the geographical diversity of the dataset but also ensures high-quality annotations, ultimately improving the performance and generalizability of SAR road segmentation models.
3.2. Constructing the SpaceNet 6 Road (SN6R) Dataset
We first construct the SpaceNet 6 Road (SN6R) dataset by integrating road layers from OpenStreetMap with SAR image tiles from the SpaceNet 6 Building Extraction Challenge. Comprising high-resolution SAR imagery from the SpaceNet 6 Building Extraction Challenge, this dataset is enriched with geospatially paired road masks derived from OpenStreetMap.
Image Source: The SpaceNet6 Building (SN6B) Dataset Image Tiles The SpaceNet 6 Building Dataset offers a rich collection of SAR and optical imagery. It sources images from over 120 square kilometers of Rotterdam, The Netherlands, and includes more than 202 SAR image strips from Capella Space with a resolution of 0.25 m per pixel. Complementing this is a detailed optical image strip from Maxar WorldView 2, covering a similar area with a resolution of 0.5 m per pixel [9]. These images, formatted into SAR-intensity and RGB image tiles of 900 × 900 pixels, serve as the foundation for the SN6 Building Extraction Challenge [9].
For our study, we leveraged 3401 SAR-intensity image tiles from the SpaceNet 6 Building Extraction Challenge’s training set, which will be referred to as the SN6B dataset. We then derived the road masks for these image tiles from OSM.
Label Source: The OpenStreetMap (OSM) Road Layer OpenStreetMap (OSM), a collaborative project, provides a free, editable world map generated by volunteers. Its road layer is particularly valuable for representing global road networks in detail. Despite occasional temporal discrepancies and structural offsets due to imaging angles, OSM’s extensive and diverse data generally ensure its reliability for various applications, including road segmentation tasks, as shown in studies like the Massachusetts Roads dataset [22]. We derived road masks aligned with the geo-referenced images within the SN6B dataset from OSM. Our methodology encompassed several steps to ensure precision and relevance, including the following:
Downloading the most up-to-date roads layer of the relevant regions from Geofabrik (
https://www.geofabrik.de/data/download.html , accessed on 30 March 2023) to reduce the query time and to avoid temporal inconsistency.Extracting the coordinates from the geo-referenced SN6B image chips and converting them to the correct map Coordinate Reference System (CRS) to align with OSM road layers.
Querying the coordinates in the OSM layer to extract the correct road vector file.
Filtering and standardizing the obtained road vectors to generate binary masks, focusing on the roads relevant to vehicular traffic.
Post-Processing Adhering to the SpaceNet Roads Dataset Labeling Guidelines [3], we exclude roads not suited for vehicular traffic, and standardize road widths to 4 m. This decision was made to concentrate on a vehicular road network, recognizing that smaller paths and trails present a significantly different appearance and are not the focus of our analysis. The road vectors were then rasterized into binary road mask images. We also masked out the corresponding unavailable regions in the masks. The overall framework of the creation of the SN6R is illustrated in Figure 5.
3.3. Constructing the SN3-SAR and DG-SAR Datasets
We constructed the SN3-SAR and DG-SAR datasets by converting the high-resolution EO images in the SN3 and DG datasets into synthetic SAR images and pairing them with the original high-quality road masks.
Image Source: The SpaceNet 3 (SN3) and DeepGlobe (DG) Image Tiles To obtain the image tiles for the two synthetic SAR datasets, we used a GAN-based image translation model, DDAGAN [7] to convert the EO image tiles in the SN3 and DG datasets into pseudo-SAR images. The characteristics of these EO datasets and the SN6-SAROPT dataset that was used to train the DDAGAN model are detailed in Table 3.
Rationale for Using DDAGAN for Synthesis Addressing the challenge of accurately synthesizing SAR data from optical images requires overcoming their inherent radiometric and physical disparities, which introduce nonlinear distortions in EO-SAR image pairs and lead to insufficient pixel-wise correspondence. To tackle this, we selected the Dual Distortion-Adaptive GANs (DDAGAN) model [7]. Generative Adversarial Networks (GANs) consist of two competing networks: a generator that produces synthetic data and a discriminator that distinguishes between real and generated data. Inspired by RegGAN [25], which incorporates an additional registration network for adaptive misalignment correction, DDAGAN introduces a distortion-adaptive module to each image domain, effectively addressing geometric distortions between EO and SAR images and securing state-of-the-art results in both EO2SAR and SAR2EO image translation across various resolutions. Demonstrating exceptional results on the SN6-SAROPT dataset—sourced from the same dataset as our SN6R—DDAGAN’s capabilities align perfectly with our goal of generating high-quality synthetic SAR data to enhance our SN6R datasets for road segmentation model training. The DDAGAN model employed in our study was trained on the SN6-SAROPT dataset, comprising 724 EO-SAR image pairs (512 × 512) of identical RoIs.
Post-Processing The DG image tiles, matching the SN6-SAROPT tiles in dimension and spatial resolution, require no additional processing. The SN3 dataset, however, necessitates segmentation and subsequent processing through DDAGAN, followed by stitching the outputs to preserve image size and resolution, similar to the process in [17]. The stitching process is illustrated in Figure 6.
During the development of the SN3-SAR dataset, we observed that approximately 30% of the original EO image tiles were partially obscured and masked, a condition not represented in the SN6-SAROPT dataset. Consequently, these obstructed tiles posed challenges for the DDAGAN network, either failing to translate or incorrectly interpreting the masked areas as water surfaces, thereby complicating the road segmentation process. Notably, similar obstructions are present in the SN6R dataset, prompting our decision to retain these images but implement a re-masking process. This process entails identifying obstructed regions by isolating large clusters of black pixels in the input images. Despite potential inaccuracies, this method proves to be sufficiently effective. With re-masking applied, all tiles, including those with obstructions, are reintegrated into our workflow. This is followed by a stitching procedure to compile the adjusted tiles. The comparative results, showcasing the impact of the re-masking process, are depicted in Figure 7.
Label Source: The SpaceNet 3 (SN3) and DeepGlobe (DG) Road Labels The road masks in our SN3-SAR and DG-SAR datasets are taken from their respective EO datasets. For SN3-SAR, the released SN3 road labels are manually labeled road-centerline vectors, which are then converted into binary images. For DG-SAR, the released DG road labels are manually labeled road surface pixels, so no further processing is required.
The final results, namely the two synthetic SAR datasets, DG-SAR and SN3-SAR, are shown in Figure 8.
4. Methodology
Network Architecture We utilized the City-Scale Road Extraction from Satellite Imagery v2 (CRESIv2) network, a road segmentation model inspired by the winning entry of the SpaceNet 3 Road Extraction Challenge [3,17]. Originally, CRESIv2 was designed to extract road networks and predict road speed limits simultaneously, taking three-channel RGB images as input and outputting multi-channel masks. However, we used it solely for road segmentation and produce binary masks. Our modifications to the model are minimal. By maintaining the core structure of the CRESIv2 network, we aimed to directly assess its efficacy in the SAR domain without extensive model alterations.
The encoder of our modified ResUNet architecture begins with an initial convolutional block, where the input image is processed by a 7 × 7 convolution with a stride of 2, followed by batch normalization and ReLU activation. This is followed by a series of residual blocks adapted from ResNet34, which progressively reduce the spatial dimensions of the feature maps while increasing the depth. These blocks are organized into five stages: the first stage includes a max-pooling layer followed by three basic residual blocks with 64 filters; the second stage includes four basic residual blocks with 128 filters, incorporating a down-sampling operation; the third stage contains six basic residual blocks with 256 filters, also incorporating a down-sampling operation; and the fourth stage comprises three basic residual blocks with 512 filters, further reducing spatial dimensions.
The bottleneck section bridges the encoder and decoder, processing the deepest feature maps through multiple convolutional layers. The first bottleneck convolves 512 filters into 256 filters, the second reduces 256 filters to 128 filters, and subsequent bottlenecks reduce 128 filters to 64 filters.
The decoder reconstructs the high-resolution road masks by progressively up-sampling the feature maps and incorporating skip connections from the encoder to retain spatial details. Each decoder block consists of an up-sampling operation followed by a convolutional layer and ReLU activation.
The final output layer applies a 3 × 3 convolution to generate a mask where each pixel is given a probability of being road (1) or non-road (0). The inputs to our network consist of high-resolution SAR images of different sizes from the HybridSAR Road Dataset (HSRD), which includes real SAR images (SN6R) and synthetic SAR images (SN3-SAR and DG-SAR). Each input image is a three-channel image representing SAR intensity values from different polarizations.
Our ResUNet model is initialized with ImageNet-pretrained weights to leverage the general feature extraction capabilities learned from a vast collection of natural images. This transfer learning approach accelerates convergence and enhances the performance of the model on SAR imagery. We employed a combination of binary cross-entropy (BCE) loss and Jaccard loss to optimize the model, enhancing both pixel-wise accuracy and overlap between predicted and true road masks. The AdamW optimizer was used to train the model, with a learning rate schedule adjusted during the pre-training and fine-tuning stages.
Data Sampling Strategy As demonstrated in Figure 9, there exist a large amount of overlapping tiles within the SN6R tiles, which stem from the 202 heavily overlapping SAR image strips from which the image tiles are derived. These overlaps enrich the dataset’s diversity by incorporating images captured at different times and angles, but also pose a significant risk of data leakage, where tiles in the testing set may closely resemble those in the training set. To address this issue, we adopted a data sampling strategy inspired by ref. [26], systematically dividing the overall area into ten distinct, non-overlapping sub-regions. Each of the 3401 tiles was allocated to a sub-region if more than half of the tiles are located within it, ensuring that no tile appeared in more than one subset. Subsequently, we designate three subsets as the testing set and one as the validation set, with the remaining six earmarked for training. This approach effectively minimizes the chance of overlap between the training and testing datasets, thereby preventing data leakage and bolstering the integrity of our model evaluation process. The final data split of all of the datasets used to train our model is shown in Table 4.
5. Experiment
5.1. Experimental Settings
Our road segmentation models were developed using a single Nvidia RTX A5000 GPU (NVIDIA Corporation, Santa Clara, CA, USA) and the Torch framework. Training incorporated common data augmentation techniques, including horizontal flipping, random rotation, and random cropping, to improve model robustness. We employed the AdamW optimizer, combining binary entropy loss with Jaccard loss, with the final loss being . During prediction, we standardized road classification by setting a threshold of 0.1: pixels with a prediction probability above 0.1 were identified as roads, enhancing consistency in our model’s evaluations.
Setting 1: HybridSAR (Pre-Training with SN6R, SN3-SAR, and DG-SAR) In this setting, we pre-trained the model on the training sets of the HybridSAR Road Dataset (SN6R, SN3-SAR, and DG-SAR). During pre-training, images were randomly cropped to pixels, conforming to the DeepGlobe tile size. The initial learning rate was set at . We trained for 500 epochs, saving the model every 50 epochs. The final model was chosen based on the best performance on the validation set.
In the fine-tuning phase, we continued with only the SN6R training set but adjusted the learning rate to . During this step, the images were randomly cropped to pixels, as empirical results have shown that at this stage, feeding larger image tiles improves the performance of the model. We trained for another 200 epochs and saved the model every 10 epochs.
Setting 2: OpticalSAR (Pre-Training with SN6R, SN3, and DG) To demonstrate the necessity of using synthesized SAR images over directly using optical images, we trained an additional model utilizing the training sets of SN6R, and the EO road datasets SN3 and DG. The pre-training and fine-tuning stage were the same as in Setting 1, except that we used the EO datasets, instead of the synthetic SAR datasets SN3-SAR and DG-SAR.
Setting 3: OnlySAR (In-Domain Training and Testing) To illustrate the benefits of leveraging the additional datasets, we conducted training sessions for models exclusively within their respective domains. This involved training and testing within the same dataset or subset for SN6R, SN3-SAR, and DG-SAR, independently. For each dataset, the initial learning rate was set at , and we trained for 200 epochs. During training, images were randomly cropped to the maximum size allowed. The performance was evaluated with the testing set of the same dataset or subset, ensuring a focused assessment of the model’s capability within each dataset’s context.
5.2. Evaluation Metrics
We evaluated our model using the recall, precision, IoU, and F1-score.
Recall We used recall to ensure that our model is capable of identifying the majority of true road segments, emphasizing the model’s sensitivity to road features and minimizing missed detections. The formula is
where TP (True Positives) is the number of correctly identified road pixels, and FN (False Negatives) is the number of road pixels that were not detected.Precision Precision was employed to assess the accuracy of the road segments predicted by our model, highlighting its ability to avoid misclassifying non-road pixels as roads, thereby ensuring the reliability of detected roads. The formula is
Here, FP (False Positives) is the number of non-road pixels incorrectly classified as road.
IoU The IoU (Intersection over Union) represents the overlap between the predicted road segments and the ground truth over their union. The IoU offers a balanced view of both the model’s precision and recall, which is crucial for evaluating the exactness of road delineation. The formula is
F1-score The F1-score combines precision and recall into a single metric by calculating their harmonic mean, offering a balanced measure of the model’s performance. The formula is
These four metrics are widely used in the evaluation of image segmentation models. To account for the issue of inconsistent road width in labeling schema, we used a four-pixel buffer when calculating the metrics, accommodating minor misalignments in road predictions.
6. Results
6.1. Road Segmentation Results on SN6R
We present the results of the three models in Table 5. The HybridSAR model emerges as the superior choice. This model achieves the highest Intersection over Union (IoU) and F1-score among the compared models, indicating a robust ability to accurately segment road areas while maintaining a high level of precision and recall. In contrast, while the OpticalSAR model shows a slightly higher precision, suggesting a cautious approach that slightly edges out in terms of accuracy, it falls short on other metrics like the recall and IoU, indicating a potential under-detection of true positives. Visual results, illustrated in Figure 10, prove that the OnlySAR model sometimes omits small segments of roads.
Meanwhile, the OnlySAR model, despite its high recall, which indicates that it misses fewer actual road segments, suffers significantly in precision. This high recall comes at the expense of a greater number of false positives, as reflected by its lower IoU and F1-score, undermining its overall segmentation effectiveness. Looking at the direct outputs of the model, illustrated in Figure 10, we observe that the SN6R-only model has difficulty deciding the edges of the road, which can explain its high recall and low precision score.
6.2. Testing the Model Performance on Datasets from Other Locations
In order to demonstrate the generalizability of our model to different geographical locations, we also applied the same framework to road segmentation on the DG-SAR dataset (for Thailand) and the Paris subset of SN3-SAR. We compared the model that had no pre-training on the HSRD dataset with HybridSAR, for which the pre-training steps were the same as in the last section. The results, as displayed in Table 6 and Table 7, underscore the superior performance of the HybridSAR model across both datasets in key metrics. Models trained exclusively on either the SN3-SAR-Paris or DG-SAR datasets without pre-training show significantly poorer results, particularly on the SN3-SAR-Paris subset where training data and location variety are limited. These findings highlight the substantial improvements in road segmentation accuracy and generalizability achieved through pre-training on the HSRD dataset, emphasizing the effectiveness of our method across diverse geographical locations.
6.3. Applicability of Different Methods
In addition to the road segmentation model we tested, we considered other segmentation methods for their potential applicability to our task. One such method is the SpaceNet 4 Building Extraction Challenge winner, XDXD [27], and the results are presented in Table 8. Unfortunately, this method did not perform well on our SAR road segmentation task, likely due to fundamental differences between building and road segmentation. Building segmentation often deals with distinct, isolated structures, while road segmentation requires continuous, linear feature extraction across varying terrains and conditions. These differences could explain the challenges in directly applying building segmentation methods to road segmentation tasks.
7. Conclusions and Future Work
In conclusion, this paper addresses the significant challenge of road segmentation from Synthetic Aperture Radar (SAR) imagery, a task critical for applications such as urban planning and disaster management. We introduced the HybridSAR Road Dataset, the first dataset of its kind, which combines real SAR images with synthetic data derived from electro-optical (EO) datasets using a generative adversarial network (GAN). This approach mitigates the scarcity of annotated SAR data and overcomes the domain gap between EO and SAR imagery. Our enhanced training framework, integrating both real and synthetic SAR datasets, demonstrates that state-of-the-art road segmentation techniques developed for EO imagery can be effectively adapted to the SAR domain. The results showcase notable improvements in road segmentation accuracy and model generalizability across diverse geographic settings. This work not only paves the way for future advancements in SAR-based road segmentation but also highlights the potential of leveraging EO datasets and methodologies to enhance SAR data applications, ultimately contributing to the broader field of remote sensing and image analysis.
Conceptualization, T.L. and B.W.; data curation, T.L.; formal analysis, T.L.; funding acquisition, B.W.; investigation, T.L.; methodology, T.L., Y.Q., S.H. and B.W.; project administration, B.W.; resources, Y.Q. and B.W.; software, T.L.; supervision, S.H. and B.W.; validation, S.H. and B.W.; visualization, T.L.; writing—original draft, T.L.; writing—review and editing, T.L., Y.Q., S.H. and B.W. All authors have read and agreed to the published version of the manuscript.
The HSRD dataset can be downloaded from
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. (a) An electric-optical (EO) remote sensing image. (b) A Synthetic Aperture Radar (SAR) image of the same scene. (c) The rasterized road mask of the scene. The EO and SAR images are from the SN6-SAROPT [7] dataset, and the road mask is taken from OpenStreetMap.
Figure 2. The HybridSAR Road Dataset consists of one real SAR road dataset (SN6R) and two synthetic SAR road datasets (SN3-SAR and DG-SAR).
Figure 3. Sample image and mask of the major road datasets. (a) The DeepGlobe Dataset provides labels as 512 × 512 road masks corresponding to the road surface. (b) The SpaceNet3 Road Dataset provides road centerline vectors that can be rasterized for 1300 × 1300 image tiles. (c) The Massachusetts dataset. OSM road layers are extracted and rasterized for the 1500 × 1500 image tiles.
Figure 4. Overview of the HybridSAR Road Dataset (HSRD) package creation framework. 1. Construction of the SN6R dataset by merging road layers from OpenStreetMap (OSM) with SAR image tiles from the SpaceNet 6 Building (SN6B) dataset. 2. Transformation of two optical road datasets, DeepGlobe (DG) and SpaceNet 3 (SN3), into synthetic SAR datasets (DG-SAR and SN3-SAR) through an EO2SAR translation model.
Figure 5. Detailed process for deriving road masks from OSM for the SN6R dataset. Steps include (1) extracting coordinate information from the geo-referenced SN6B image file, (2) rasterizing OSM road vectors into binary images, (3) clipping the binary image with the coordinates, and (4) applying filtering and other post-processing methods to finalize the road mask tile.
Figure 6. The workflow of synthesizing SAR images from optical SpaceNet 3 images. (1) The 1300 × 1300 SN3 image is split into multiple 512 × 512 pieces. (2) Each piece is then fed through the EO2SAR generator of the DDAGAN model, a specialized optical-to-SAR GAN, effectively transforming the optical data into SAR-like imagery. (3) The SAR-styled pieces are stitched back together, reconstructing the full image. (4) This composite image is subsequently normalized by dividing it by a coverage map, which accounts for overlapping areas during the stitching process. (5) The final normalized SAR map, ready for further analysis or application.
Figure 7. Demonstration of the re-masking step of the obstructed areas in the process of createing the SN3-SAR dataset. (a) The obstructed SN3 optical images; (b) The direct output of the DDAGAN, after stitching, without re-masking; (c) The final output, after the re-masking step.
Figure 8. Qualitative visualization results of synthetic SAR data. (a) Original DeepGlobe image tiles; (b) corresponding synthetic DG-SAR image tiles produced by DDAGAN; (c) original SpaceNet 3 image tiles; (d) corresponding synthetic SN3-SAR image tiles produced by DDAGAN.
Figure 9. Visualization of the SN6R training/validation/testing set’s coverage, with the map of Rotterdam as the basemap reference. Each square tile in the figure reflects the location of the corresponding SAR image tile, with white tiles in the training set, the blue tiles in the validation set, and the red tiles in the testing set. As demonstrated, there exist a large amount of overlapping tiles, which calls for a dedicated data sampling strategy. The basemap is provided by ESRI World Imagery.
Figure 10. Segmentation results of the trained models: (a) the corresponding RGB image (for reference); (b) te SN6R SAR input image; (c) the ground truth road mask; (d) the prediction result of OnlySAR; (e) the prediction result of OpticalSAR; (f) the prediction result of HybridSAR. The red boxes highlight the differences in the segmentation results.
A detailed summary of the HybridSAR Road Dataset (HSRD).
Dataset | # of Images | Spatial Resolution | Size | Region of Interest (RoI) | SAR Image Type | Road Label Type |
---|---|---|---|---|---|---|
SN6R | 3401 | 0.5 m | 900 | Rotterdam | real | OSM |
SN3-SAR | 2780 | 0.3 m | 1300 | Vegas, Paris, Shanghai, Khartoum | synthetic | manual |
DG-SAR | 6226 | 0.5 m | 512 | Thailand | synthetic | manual |
Comparison of popular optical road segmentation datasets with image tiling and labeling schemes.
Dataset | Image Tiling Scheme | Labeling Scheme | ||
---|---|---|---|---|
Geo-Referenced | Obstructed | Road Width | Road Type | |
SpaceNet 3 | ✓ | ✓ | constant | geospatial vector data |
DeepGlobe | × | × | varying | image mask |
Massachusetts | ✓ | ✓ | 7 pixels | image mask |
Overview of the SpaceNet 3 and DeepGlobe datasets, as well as the SN6-SAROPT dataset that was used to train the DDAGAN model.
Dataset | # of Image Tiles | Size (px) | Spatial Resolution | Regions of Interest |
---|---|---|---|---|
SpaceNet 3 (SN3) | 2780 | 1300 | 0.3 m | Vegas, Paris, Shanghai, Khartoum |
DeepGlobe (DG) | 6226 | 1024 | 0.5 m | Thailand |
SN6-SAROPT | 724 × 2 | 512 | 0.5 m | Rotterdam |
The number of image tiles in the training, validation, and testing sets in the SN6R, SN3-SAR, and DG-SAR datasets, respectively.
Training | Validation | Testing | Total | ||
---|---|---|---|---|---|
SN6R | 2286 | 169 | 946 | 3401 | |
DG-SAR | 4150 | 218 | 1858 | 6226 | |
SN3-SAR | 1852 | 98 | 830 | 2780 | |
sub-sets | SN3-SAR Paris | 206 | 11 | 93 | 310 |
SN3-SAR Vegas | 661 | 35 | 293 | 989 | |
SN3-SAR Shanghai | 798 | 42 | 358 | 1198 | |
SN3-SAR Khartoum | 187 | 10 | 86 | 283 |
Performances of three models on road segmentation on SN6R test set. HybridSAR achieves best score in terms of IoU and F1.
Model | Recall | Precision | IoU | F1-Score |
---|---|---|---|---|
HybridSAR | 56.70 | 50.27 | 37.39 | 52.11 |
OpticalSAR | 52.13 | 52.51 | 36.58 | 50.97 |
OnlySAR | 59.24 | 35.72 | 28.54 | 42.88 |
Road segmentation results on the SN3-SAR Paris dataset.
Model | Recall | Precision | IoU | F1-Score |
---|---|---|---|---|
OnlySAR | 58.93 | 51.27 | 36.80 | 49.35 |
HybridSAR | 64.50 | 66.08 | 46.47 | 59.39 |
Road segmentation results on the DG-SAR dataset.
Model | Recall | Precision | IoU | F1-Score |
---|---|---|---|---|
OnlySAR | 74.33 | 71.20 | 57.13 | 70.64 |
HybridSAR | 77.34 | 68.75 | 57.42 | 71.12 |
Performance comparison of the XDXD model, which was designed for building segmentation, with the HybridSAR model used in our study.
Model | Recall | Precision | IoU | F1-Score |
---|---|---|---|---|
XDXD | 92.50 | 7.11 | 7.06 | 13.03 |
HybridSAR | 56.70 | 50.27 | 37.39 | 52.11 |
References
1. Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens.; 2020; 12, 1444. [DOI: https://dx.doi.org/10.3390/rs12091444]
2. Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Salt Lake City, UT, USA, 18–22 June 2018; pp. 172-181. [DOI: https://dx.doi.org/10.1109/CVPRW.2018.00031]
3. Van Etten, A.; Lindenbaum, D.; Bacastow, T.M. SpaceNet: A Remote Sensing Dataset and Challenge Series. arXiv; 2018; [DOI: https://dx.doi.org/10.48550/arxiv.1807.01232] arXiv: 1807.01232
4. Huang, B.; Li, Y.; Han, X.; Cui, Y.; Li, W.; Li, R. Cloud Removal from Optical Satellite Imagery with SAR Imagery Using Sparse Representation. IEEE Geosci. Remote Sens. Lett.; 2015; 12, pp. 1046-1050. [DOI: https://dx.doi.org/10.1109/LGRS.2014.2377476]
5. Brunner, D.; Lemoine, G.; Bruzzone, L. Earthquake Damage Assessment of Buildings Using VHR Optical and SAR Imagery. IEEE Trans. Geosci. Remote Sens.; 2010; 48, pp. 2403-2420. [DOI: https://dx.doi.org/10.1109/TGRS.2009.2038274]
6. Wang, T.L.; Jin, Y.Q. Postearthquake Building Damage Assessment Using Multi-Mutual Information from Pre-Event Optical Image and Postevent SAR Image. IEEE Geosci. Remote Sens. Lett.; 2011; 9, pp. 452-456. [DOI: https://dx.doi.org/10.1109/LGRS.2011.2170657]
7. Qing, Y.; Zhu, J.; Feng, H.; Liu, W.; Wen, B. Two-Way Generation of High-Resolution EO and SAR Images via Dual Distortion-Adaptive GANs. Remote Sens.; 2023; 15, 1878. [DOI: https://dx.doi.org/10.3390/rs15071878]
8. OpenStreetMap Contributors. Planet Dump Retrieved from https://planet.osm.org. 2017; Available online: https://www.openstreetmap.org (accessed on 30 March 2023).
9. Shermeyer, J.; Hogan, D.; Brown, J.; Van Etten, A.; Weir, N.; Pacifici, F.; Hansch, R.; Bastidas, A.; Soenen, S.; Bacastow, T. et al. Spacenet 6: Multi-Sensor All Weather Mapping Dataset. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Seattle, WA, USA, 14–19 June 2020; Volume 2020-June, [DOI: https://dx.doi.org/10.1109/CVPRW50498.2020.00106]
10. Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A Global Context-aware and Batch-independent Network for Road Extraction from VHR Satellite Imagery. ISPRS J. Photogramm. Remote Sens.; 2021; 175, pp. 353-365. [DOI: https://dx.doi.org/10.1016/j.isprsjprs.2021.03.016]
11. Jing, R.; Gong, Z.; Zhu, W.; Guan, H.; Zhao, W. Island Road Centerline Extraction Based on a Multiscale United Feature. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2018; 11, pp. 3940-3953. [DOI: https://dx.doi.org/10.1109/JSTARS.2018.2872520]
12. Maboudi, M.; Amini, J.; Malihi, S.; Hahn, M. Integrating Fuzzy Object Based Image Analysis and Ant Colony Optimization for Road Extraction from Remotely Sensed Images. ISPRS J. Photogramm. Remote Sens.; 2018; 138, pp. 151-163. [DOI: https://dx.doi.org/10.1016/j.isprsjprs.2017.11.014]
13. Chen, L.; Zhu, Q.; Xie, X.; Hu, H.; Zeng, H. Road Extraction from VHR Remote-Sensing Imagery via Object Segmentation Constrained by Gabor Features. ISPRS Int. J. Geo-Inf.; 2018; 7, 362. [DOI: https://dx.doi.org/10.3390/ijgi7090362]
14. Wang, J.; Qin, Q.; Gao, Z.; Zhao, J.; Ye, X. A New Approach to Urban Road Extraction Using High-resolution Aerial Image. ISPRS Int. J. Geo-Inf.; 2016; 5, 114. [DOI: https://dx.doi.org/10.3390/ijgi5070114]
15. Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Salt Lake City, UT, USA, 18–22 June 2018; pp. 192-1924. [DOI: https://dx.doi.org/10.1109/CVPRW.2018.00034]
16. Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett.; 2018; 15, pp. 749-753. [DOI: https://dx.doi.org/10.1109/LGRS.2018.2802944]
17. Etten, A.V. City-Scale Road Extraction from Satellite Imagery v2: Road Speeds and Travel Times. Proceedings of the IEEE Winter Conference on Applications of Computer Vision; Snowmass Village, CO, USA, 1–5 March 2020; pp. 1775-1784. [DOI: https://dx.doi.org/10.1109/WACV45572.2020.9093593]
18. Chen, K.; Zou, Z.; Shi, Z. Building Extraction from Remote Sensing Images with Sparse Token Transformers. Remote Sens.; 2021; 13, 4441. [DOI: https://dx.doi.org/10.3390/rs13214441]
19. Vargas-Munoz, J.E.; Srivastava, S.; Tuia, D.; Falcao, A.X. OpenStreetMap: Challenges and Opportunities in Machine Learning and Remote Sensing. IEEE Geosci. Remote Sens. Mag.; 2021; 9, pp. 184-199. [DOI: https://dx.doi.org/10.1109/MGRS.2020.2994107]
20. Mattyus, G.; Wang, S.; Fidler, S.; Urtasun, R. HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA, 27–30 June 2016; Volume 2016-December, [DOI: https://dx.doi.org/10.1109/CVPR.2016.393]
21. Bastani, F.; He, S.; Abbar, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Madden, S.; Dewitt, D. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA, 18–22 June 2018; [DOI: https://dx.doi.org/10.1109/CVPR.2018.00496]
22. Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis; University of Toronto: Toronto, ON, Canada, 2013.
23. Sumbul, G.; Charfuelan, M.; Demir, B.; Markl, V. BIGEARTHNET: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium; Yokohama, Japan, 28 July–2 August 2019; pp. 5901-5904.
24. Zhao, J.; Zhang, Z.; Yao, W.; Datcu, M.; Xiong, H.; Yu, W. OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2020; 13, pp. 187-203. [DOI: https://dx.doi.org/10.1109/JSTARS.2019.2954850]
25. Kong, L.; Lian, C.; Huang, D.; Li, Z.; Zhou, Q.; Hu, Y. Breaking the Dilemma of Medical Image-to-Image Translation. Adv. Neural Inf. Process. Syst.; 2021; 3, pp. 1964-1978.
26. Wangiyana, S. [SN6] Splitting Image Tiles. 2021; Available online: https://www.kaggle.com/code/sandhiwangiyana/sn6-splitting-image-tiles (accessed on 30 March 2023).
27. Weir, N. The SpaceNet Challenge Off-Nadir Buildings: Introducing the Winners. 2019; Available online: https://medium.com/the-downlinq/the-spacenet-challenge-off-nadir-buildings-introducing-the-winners-b60f2b700266 (accessed on 30 April 2023).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this study, we tackle the task of road segmentation from Synthetic Aperture Radar (SAR) imagery, which is vital for remote sensing applications including urban planning and disaster management. Despite its significance, SAR-based road segmentation is hindered by the scarcity of high-resolution, annotated SAR datasets and the distinct characteristics of SAR imagery, which differ significantly from more commonly used electro-optical (EO) imagery. To overcome these challenges, we introduce a multi-source data approach, creating the HybridSAR Road Dataset (HSRD). This dataset includes the SpaceNet 6 Road (SN6R) dataset, derived from high-resolution SAR images and OSM road data, as well as the DG-SAR and SN3-SAR datasets, synthesized from existing EO datasets. We adapt an off-the-shelf road segmentation network from the optical to the SAR domain through an enhanced training framework that integrates both real and synthetic data. Our results demonstrate that the HybridSAR Road Dataset and the adapted network significantly enhance the accuracy and robustness of SAR road segmentation, paving the way for future advancements in remote sensing.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer