Full Text

Turn on search term navigation

1. Introduction

Semantic segmentation assigns a unique label to each pixel in an image. General semantic segmentation methods are mostly trained on images with clear details in good light conditions and have achieved good results in general semantic segmentation datasets. However, when directly applied to the nighttime driving-scene, they lead to the degradation of segmentation accuracy and inaccurate image content recognition. Nighttime driving-scene segmentation has potential application prospects in automatic driving [1,2,3], security monitoring [4,5], and environmental monitoring [6,7]. Therefore, it is of great significance to design more accurate segmentation methods for nighttime driving-scene.

Currently, most nighttime semantic segmentation methods adopt an unsupervised domain adaptation (UDA) strategy, which transfers knowledge learned from labeled source domains to unlabeled target domains to bridge the gap between daytime and nighttime image domains. Unsupervised domain adaptation methods that extract domain-invariant representations through image transformation models [8,9,10,11,12] and adversarial learning [13,14,15,16,17] have achieved impressive results. However, due to the weak supervision of target instances, these methods cannot completely eliminate domain gaps. Compared with supervised methods [18,19,20,21], there is still a significant performance gap.

Nighttime driving-scene segmentation faces two main challenges. One is that it is difficult to obtain large-scale labeled datasets of nighttime driving-scenes due to poor nighttime perception. Deep learning is essentially based on data-driven methods. The standard strategy for achieving high-performance segmentation is to use neural networks to train large amounts of labeled night data. However, the process of collecting and labeling nighttime images is labor-intensive. For example, the manual annotation of cityscape images takes about 90 min [22]. However, the time exceeds three hours for severe condition datasets, such as the ACDC [23] dataset. In addition, it is very difficult to establish high-quality annotations at the pixel level because of the large area of dark or shadowed parts in nighttime images. The other challenge is that nighttime images often suffer from uneven illumination, low image brightness and color deviation because of the limitation of illumination conditions. Therefore, the features extracted by convolutional layers exhibit significant differences compared to those obtained under good lighting conditions. As a result, the models trained on daytime road datasets cannot be directly applied to nighttime driving-scenes.

A nighttime driving-scene segmentation method based on a light-enhanced network was proposed to address the aforementioned issues. Firstly, a generative adversarial network [24] (GAN) is employed to convert daytime images into nighttime images. Secondly, our proposed light-enhanced network applies color correction to the nighttime images, and then adaptive enhancement is conducted based on the brightness, contrast, and exposure information of the corrected image. Finally, the output of the light enhancement network is input into the segmentation network to obtain the segmentation prediction. This approach overcomes the problems of low brightness and low contrast in night images, improving the nighttime driving-scene segmentation performance. The proposed method was experimented on the nighttime driving-scene segmentation dataset Dark Zurich-test. Compared with other nighttime driving-scene segmentation methods, the proposed method can effectively improve the accuracy of segmentation.

The main contributions to this work can be summarized as follows:

Considering of the scarcity of labeled driving-scene data, the generative adversarial network converts daytime images from the Cityscapes dataset into nighttime images to create the TransCity dataset, which is then utilized for subsequent training processes.
Considering the existence of low brightness and color bias in the images from nighttime driving-scene, a light-enhanced network was proposed, which comprises two parts: a color correction module and a parameter predictor. The light enhanced network can highlight the color details and improve the visibility of the image.
Experimental results on the Dark Zurich-test show that our proposed method outperforms the existing segmentation algorithms in nighttime driving-scene.

2. Related Work

2.1. Image Generation

GAN technology has been widely used in the image generation field. For the task of pixel-to-pixel image generation, researchers [25,26] proposed several solutions based on GAN. However, these methods often require acquisition of paired image sets. But obtaining such corresponding image sets can be challenging. To address this problem, Chen et al. [27] proposed the CartoonGAN method, which uses unpaired training data to achieve mapping between real pictures and cartoon pictures, thus synthesizing high-quality cartoon images. Inspired by this, we applied CartoonGAN to style the transformation of daytime and nighttime driving-scene images. However, using only CartoonGAN may cause colored light spots in the synthesized images. Therefore, we added the side window box filter module, proposed a more suitable method for daytime and nighttime driving-scene image conversion, and constructed a nighttime driving-scene dataset—TransCity.

2.2. Nighttime Driving-Scene Semantic Segmentation

Semantic segmentation plays a pivotal role in various computer vision understanding systems. Long et al. [28] proposed a fully convolutional neural network, which replaces the fully connected layer used for image classification with a fully convolutional layer. The method can enable the network to generate segmentation masks for images of any size. Lin et al. [29] proposed the RefineNet network, which employed a multi-scale analysis to extract the global context of images while retaining low-level details. However, these generic segmentation networks exhibited poor performance when applied to a nighttime driving-scene. Therefore, researchers [30,31] turned to study the adaptation of models to adverse conditions. Dai et al. [9] first used the intermediate twilight domain to gradually adjust the semantic model trained in the daytime driving-scene to the nighttime driving-scene. Sakaridis et al. [10,11] further extended it to the guided course adaptation method, proposing a segmentation model that gradually adapts from day to night. However, this progressive adaptive method usually requires training multiple segmentation models. For example, three models in the literature [9] are used in three different domains, which is less efficient. Xu et al. [12] proposed the CDada method, which adapted the model through entropy minimization and self-training methods. Some recent works also use GAN to effectively reduce the domain gap. For example, the literature [13,14] used GAN to learn the mapping of input images to output images, and improved the segmentation performance from two perspectives, including direct inference of nighttime driving-scene images and real-time online conversion inference of nighttime images through style transformation. Recently, Wu et al. [15,16] proposed an unsupervised one-stage adaptive method, in which the image re-illumination network is placed at the head of the segmentation network, and adversarial learning is used to achieve domain alignment between labeled daytime data and unlabeled nighttime data. Gao et al. [17] proposed a method based on cross-domain correlation distillation, which takes full advantage of the invariance of the illumination between two images to compensate for the lack of annotation in night scene images. Tan et al. [18] proposed the EGNet method, which designed an exposure stream to learn exposure related features by explicitly predicting the exposure map, and then used an attention mechanism to focus the learned features into the segmentation stream to improve the recognition and segmentation of overexposed and underexposed regions by the segmentation stream. Wang et al. [19] proposed a nighttime driving-scene segmentation framework, SFNet-N, which consists of a light-enhanced network and a segmentation network. The light-enhanced network learns a set of optimal illumination fitting curves from pixel-to-pixel and semantic views, which can reconstruct details without distortion in the dark. At the same time, the segmentation network fully exploits the context information of pixels to minimize the loss of information in the decoder. However, SFNet-N expands the dataset through synthetic data collection and style transformation networks to balance the category differences between the data, which increases the human and material resources. Li et al. [20] proposed the Trans-nightSeg method, which improves the performance of semantic segmentation by adaptive enhancement network. However, the method ignores the phenomenon of yellowish tones in night images affected by illuminations, and the direct use of the enhancement network will enhance this phenomenon, so that the details of the enhanced image are not clear. Liu et al. [20] proposed a dual-image adaptive learnable filter to improve the segmentation performance under nighttime driving conditions by utilizing the inherent features of driving-scene images under different lighting conditions. However, the enhancement network ignored the problem of color bias that exists under different lighting at night, which resulted in unclear details in the enhanced images. To address the above problems, a semantic segmentation method for nighttime driving-scenes based on light enhanced network was proposed.

3. Transcity Dataset

We refer to the idea of style conversion, the real images were divided into two categories according to the style: daytime and nighttime. Daytime images from Cityscapes dataset, nighttime images from BDD dataset [31]. The daytime images were used as the source domain, and the nighttime images were used as the target domain. Firstly, the source domain images and the target domain images were input into the generative adversarial network for adversarial training. The generative adversarial network fully learned the style features of the target domain data, and generated images with the style of the target domain according to the content of the source domain data. The original CartoonGAN [27] overly learned the style of the target domain data during the conversion of day and night driving-scene images, resulting in the presence of color light spots in the generated images. In this paper we improved the style conversion network of original CartoonGAN so as to solve the above problem. Before the target domain image is input into the discriminator, a side window box filter module is added to adapt to the style conversion of day and night images. The algorithm principle of the side window box filter is as follows (Figure 1).

For any pixel point $(x, y)$ , define a set of side windows in the continuous case, as shown in Figure 1a, containing the parameters $α$ , $β$ and $λ$ . Where $α$ is the angle between the window and the horizontal line, $β$ is the window radius, $λ$ controls the window length and $λ \in [0, β]$ . The orientation of the window can be controlled by changing the value of $α$ . In the case of $β$ fixed, changing the size of $λ$ can control the longitudinal length of the window.

We define 8 types of windows, with orientations defined as $L e f t$ , $R i g h t$ , $U p$ , $D n$ , $N W$ , $N E$ , $S W$ and $S E$ , as shown in Figure 1b: these eight types of windows satisfy $α = k \times (π / 2), k \in [0, 3]$ and $λ = β \times l, l \in [0, 1]$ .

The steps of side window box filtering are as follows:

Input image and parameters: $m$ is the pixel to be processed, and $n$ is the adjacent pixel.
Define the side window group $S = {L e f t, R i g h t, U p, D n, N W, N E, S W, S E}$ .
Calculate the result $I_{i}$ after $m$ is weighted by different side windows $i$ , as shown in Formula (1), $N_{i}$ is the sum of the weights of a single side window, as shown in Formula (2).
(1) $I_{i} = \frac{1}{N_{i}} \sum_{n \in ω_{i}^{i}} w_{m n} q_{n}$

(2) $N_{i} = \sum_{n \in ω_{i}^{i}} w_{m n, i \in S}$
where $ω_{i}^{i}$ is the side window $i$ , $w_{m n}$ is the weight of each pixel in the filter window center on pixel $(m, n)$ , and $q_{n}$ is the intensity of the input image $q$ at position $n$ .
Calculate the cost function $E_{m}$ , as shown in Formula (3).
(3) $E_{m} = {‖ q_{m} - I_{i} ‖}_{2}^{2}$
where $q_{m}$ is the intensity of the input image $q$ at position $m$ .
The best side window number a $I_{j}$ is obtained, as shown in Formula (4).
(4) $I_{j} = \arg \min_{i \in S} E_{m} = \arg \min_{i \in S} {‖ q_{m} - I_{i} ‖}_{2}^{2}$

Through the obtained best side window type, the input image is processed with side window box filtering to achieve edge preservation effect.

We convert the Cityscapes semantic segmentation dataset for daytime road scenes into a synthetic dataset called Transcity, which accurately models nighttime road scenes and retains the same semantic segmentation annotations as Cityscapes. A sample of images from the Transcity dataset is shown in Figure 2.

4. Methodology

The overall algorithm flow of our proposed method for semantic segmentation of nighttime road scenes is shown in Algorithm 1.

Algorithm 1. overall algorithm of our proposed method

Input:

D_{t r a n s}

D_{c i t y}

and

D_{a c d c}

, semantic segmentation network RefineNet (

f

).Procedure:

D_{t r a n s}

Y_{t r a n s}

~Transcity

D_{c i t y}

Y_{c i t y}

~Cityscapes

D_{a c d c}

Y_{a c d c}

~ACDC-night

L_{t r a n s}

L_{c i t y}

L_{a c d c}

← the light enhanced network enhancement

D_{t r a n s}

D_{c i t y}

D_{a c d c}

P_{t r a n s}

←

f

(

L_{t r a n s}

)6:

P_{c i t y}

←

f

(

L_{c i t y}

)

P_{a c d c}

←

f

(

L_{a c d c}

)

8: Category weights are calculated in Equations (10) and (11)

9: The cross-entropy loss function is calculated in Equation (12)Return:

L_{s e g}

Our proposed method for semantic segmentation of nighttime road scenes involves three input datasets: the labeled night synthetic dataset $D_{trans}$ , the labeled daytime road scene dataset $D_{c i t y}$ and the labeled real night dataset $D_{a c d c}$ , which represent Transcity, Cityscapes and ACDC-night, respectively. As the datasets share the same annotation format, we train them jointly by weight sharing. The overall framework of our proposed method is shown in Figure 3. Firstly, three images are sampled separately from $D_{trans}$ , $D_{c i t y}$ and $D_{a c d c}$ , and input into the weight-sharing light-enhanced network for image enhancement. Then, the output results of the light-enhanced network are input into the segmentation network for collaborative prediction. Finally, the network is optimized by using the segmentation loss.

4.1. Light-Enhanced Network

Due to the restriction of illumination conditions, night driving-scene images often have problems such as uneven illumination, low image brightness and color bias, which leads to the fact that the model trained on the daytime road dataset cannot be directly applied to night driving-scene. Therefore, this study proposes a light-enhanced network, as shown in Figure 4. The process of the network is as follows. Firstly, an image of any resolution is input, and bilinear interpolation is used to down-sample image to a 256 × 256 resolution. Secondly, the image is input into the color correction module, which aims to alleviate the impact of illumination changes on nighttime driving-scene segmentation by adjusting the color information of the image. Then, the processing results are input into the parameter predictor module, aiming to predict the parameters of the image filter by understanding the global content of the image, such as brightness, contrast, and exposure level. The network adopts the image filter proposed in the literature [32], which consists of four adjustable hyper-parameters: exposure, gamma, contrast, and sharpening. The purpose is to adjust four parameters to obtain better visual effects and further improve the segmentation performance of night driving-scene. In addition, the input high-resolution image can be directly input into the image filter. The module effectively mitigates the problems of weak illumination, unclear details and color deviation in night driving-scenes. The network consists of two main parts: the color correction module and the parameter predictor.

4.2. Color Correction Module

The purpose of color correction is to adjust the color information of the image to the correct state, so that the image presents a consistent visual effect in different environments, and further alleviates the impact of illumination changes on semantic segmentation of night driving-scenes. We apply color correction to the enhancement direction of night driving-scenes. The implementation steps for color correction are as follows:

On the basis of grayscale and perfect reflection color correction, the correction of the $R$ channel is expressed in the form of “square”, as shown in Formula (5).
(5) ${\tilde{I}}_{b} (x, y) = u I_{b}^{2} (x, y) + v I_{b} (x, y)$
where $I_{b} (x, y)$ represents the value of pixel $(x, y)$ in the image on the blue channel, and $u$ and $v$ are the correction coefficients of the channel
The grayscale assumption to be established needs to satisfy Formula (6).
(6) $\sum_{x = 1}^{M} \sum_{y = 1}^{N} {\tilde{I}}_{b} (x, y) = \sum_{x = 1}^{M} \sum_{y = 1}^{N} I_{g} (x, y)$
Combined with Formula (5), Formula (6) can be converted to Formula (7).
(7) $u \sum_{x = 1}^{M} \sum_{y = 1}^{N} I_{b}^{2} (x, y) + v \sum_{x = 1}^{M} \sum_{y = 1}^{N} I_{b} (x, y) = \sum_{x = 1}^{M} \sum_{y = 1}^{N} I_{g} (x, y)$
The perfect reflection assumption is established to satisfy Formula (8).
(8) $u \max_{x, y} {I_{b}^{2} (x, y)} + v \max_{x, y} {I_{b} (x, y)} = \max_{x, y} {I_{g} (x, y)}$
The matrix form of Formulas (7) and (8) are shown in Formula (9).
(9) $(\begin{matrix} \sum \sum I_{b}^{2} & \sum \sum I_{b} \\ \max I_{b}^{2} & \max I_{b} \end{matrix}) (\begin{array}{l} u \\ v \end{array}) = (\begin{array}{l} \sum \sum I_{g} \\ \max I_{g} \end{array})$

Formula (9) can be regarded as a binary first-order equation system about $u$ and $v$ , and the values of $u$ and $v$ can be solved by the Gaussian elimination law of the column pivot entries. Similarly, $u$ and $v$ of the $R$ channel are also solved by the Gaussian elimination law. Finally, the color correction coefficients $u$ and $v$ are solved pixel by pixel for the $R$ channel and the $B$ channel, respectively, and the $G$ channel remains unchanged.

4.3. Parameter Predictor

In image signal processing, tunable filters are commonly used for image enhancement. The hyperparameters of these filters need to be manually adjusted by experienced engineers through visual inspection in order to find suitable parameters for a wide range of scenarios. However, the adjustment process can be time-consuming. To solve this problem, we use a parameter predictor to estimate the hyperparameters.

The parameter predictor aims to predict the parameters of the image filter by understanding the global content of the image, such as brightness, tone, and exposure level. As shown in Figure 3, the parameter predictor consists of a depth-separable convolutional layer, four convolutional blocks, a dropout layer with a parameter of 0.5, and a fully connected layer. Each convolutional block consists of a 3 × 3 depth-separable convolutional layer and an LRelu activation function. The output channels of the four convolution layers are 32, 64, 128, and 128, respectively. When the total number of parameters of the image filter is 4, the parametric predictor contains only 63 K parameters.

4.4. Loss Function

Due to the imbalance of the number of pixels of different object categories in the night driving-scene image, the network is difficult to learn the features of small object categories, which leads to poor performance in predicting small object pixels. Therefore, a reweighting strategy is used to improve the network’s attention to small objects, and the reweighting formula is shown in Formula (10).

(10) $w_{d}^{'} = - \log (a_{d})$

where

a_{d}

denotes the proportion of pixels in the annotated dataset that are annotated with

d

. The lower the value of

a_{d}

, the higher the weight assigned, which helps the network to classify the small size categories. The weight is further normalization by Formula (11).

(11) $w_{d} = \frac{w_{d}^{'} - \bar{w}}{σ (w)} \cdot s t d + a v g$

where

\bar{w}

and

σ (w)

are the mean and standard deviation of

w_{d}^{'}

, respectively, and

\cdot

is the dot product of the matrices; in training,

s t d

is 0.1 and

a v g

is 1.0.

The segmentation loss is adopted as a cross-entropy loss function, and is shown in Formula (12).

(12) $L_{s e g} = - \frac{1}{U_{1} | R_{1} |} {\sum_{v \in R_{1}} ‖ w_{d} G T^{(d)} \cdot \log (P^{(d)}) ‖}_{1}$

where

P^{(d)}

is the

d

-th channel of the segmentation result,

U_{1}

is the number of effective pixels in the corresponding segmentation-annotated image,

| R_{1} |

is the number of labeled classes in the dataset,

G T^{(d)}

is a heat code for the real label of the

d

-th class, and

\cdot

is the dot product of the matrix.

5. Experimental

This section mainly includes the introduction of evaluation indicators and datasets, as well as the experimental research on our semantic segmentation method for night driving-scene. We evaluated our proposed method on the Dark Zurich-test datasets. Through experiments, the effectiveness of our proposed method in the semantic segmentation of night driving-scenes is verified.

5.1. Evaluation Indicators

We use mIoU to evaluate the performance of our proposed method. In general, the larger the value of mIoU, the better the performance. It is expressed as Formula (13).

(13) $m I o U = \frac{1}{C} \sum_{C = 1}^{C} (\frac{1}{b} \sum_{t = 1}^{b} \frac{p_{c t t}}{\sum_{h = 1}^{b} p_{c t h} + \sum_{h = 1}^{b} p_{c h t} - p_{c t t}})$

where

b

is the total number of classes;

C

is the number of images in the validation set;

p_{c t t}

is the number of pixels in the

c

-th sample where both the predicted label and the real label belong to a class;

p_{c h t}

is the number of pixels in the

c

-th sample where the model predicts that the image belongs to class

h

, but in fact it belongs to class

t

5.2. Datasets

The following datasets are used for model training and performance evaluation.

Cityscapes [22]: Cityscapes is a semantic segmentation dataset with pixel-level annotations for daytime road scenes. There are a total of 5000 images in this dataset, with a resolution of 2048 × 1024. Among them, 2975 images are used for training, 500 images for validation, and 1525 images for testing purposes. This dataset was used as the training dataset in our experiments.
Transcity: In Section 3, Transcity is obtained by style transformation of the Cityscapes dataset, which shares the same semantic segmentation annotation with the Cityscapes dataset. Therefore, the format of the two datasets is completely the same. This dataset was used as the training dataset in our experiments.
ACDC-night [23]: ACDC is a large-scale dataset on semantic segmentation under harsh conditions, consisting of 4 K images captured under rain, snow, fog and night conditions. The dataset only contains 400 nighttime images with pixel-level annotations, with a resolution of 1920 × 1080. This dataset was used as the training dataset in our experiments.
Dark Zurich [9]: Dark Zurich is a large dataset for road scenes for unsupervised semantic segmentation. The dataset includes 2416 nighttime images, 2920 dusk images, and 3041 daytime images, which are unlabeled, and the resolution is 1920 × 1080. In addition, the Dark Zurich dataset also includes 201 nighttime images with pixel annotations, of which 50 are used for validation and 151 are used for testing, which can be used to quantitatively evaluate the performance of the algorithm. This dataset was used as the testing dataset in our experiments. This study has obtained the mIoU results of the algorithm on the Dark Zurich-test dataset.

5.3. Experimental Details

We used RefineNet as a baseline model, which is a semantic segmentation model pretrained on the Cityscapes dataset for 150,000 times. During training, a random cropping of size 512 × 512 with a cropping range between 0.5 and 1.0 was used and random level flipping was applied to expand the training dataset. In the experiment, a random gradient descent optimizer with momentum 0.9 and weight decay 5 × 10⁻⁴ was used to train the model, with an initial learning rate of 2.5 × 10⁻⁴, and then a dynamic adjustment learning rate strategy poly was used to gradually reduce the learning rate, with a batch size of four.

5.4. Performance Comparison on Dark Zurich-Test

We tested our proposed method on the Dark Zurich test dataset, and the obtained IoU values were compared with existing nighttime road segmentation methods, mainly including DMada [8], GCMA [9], MGCDA [10], DANNet [14], DANIA [15], SFNet-N [18], and IA [20]. Due to the lack of large-scale labeled semantic segmentation datasets for nighttime road scenes, previous methods are domain adaptive methods based on pseudo-labeled or unlabeled images. Table 1 shows the comparison results of each class of IoU performance, with the best results in bold, and sub-optimal results are underlined.

Our light enhancement network improves low brightness and color bias in night driving scenes by combining a color correction module with a parameter prediction module. The color correction module corrects the color distortion caused by insufficient light to ensure the true color of the image, while the parameter prediction module intelligently adjusts the light parameters to enhance the brightness and contrast of the image. Compared with the highest score obtained by the existing method (IA), our proposed method achieves a 2.5% improvement in the Dark Zurich-test. We have selected five common images for visualization showing several visual comparison examples of DMAda, GCMA, MGCDA, DANNet, SFNet-N, IA, and our proposed method, as shown in Figure 5. According to the results, it is found that our proposed method improves the visual quality of large categories, such as the sidewalk in Image 3, pedestrians in Image 4, and vehicles in Image 5, while also making the edges of small-sized objects clearer and smoother, such as the sidewalk edge, traffic light in Image 2, and sidewalk edges in Image 4. This shows that our proposed method can improve the performance of semantic segmentation of roads at night.

5.5. Ablation Experiment

In order to verify the effectiveness of each component of our proposed nighttime road segmentation network, we tested the performance of different modules on the Dark Zurich test dataset. Additionally, we tested the time required to split an image. The results are shown in Table 2. Through the experimental results, it can be observed that compared with the baseline RefineNet (A), our proposed method (E) performs better in low-light conditions, with a 32.9% improvement in mIoU value. Through the style transformation of the daytime dataset, the more nighttime images co-trained, the better the model learns. For example, the mIoU value of the co-trained ACDC dataset is improved by 4.4% compared with the baseline, and the mIoU value of co-trained ACDC dataset and Transcity dataset is improved by 5.3% compared with the baseline. In addition, the use of a parameter predictor improves the performance of nighttime road segmentation, such as D, which improves mIoU by 29.5% compared with the baseline. Finally, in combination with the color correction module and the parameter prediction module, our light enhancement network achieved remarkable results in night driving scenes, ultimately achieving a mIoU performance of up to 59.4%. This achievement reflects the advantages of our approach in dealing with low brightness and color bias, and it demonstrates the ability of each component to work together. The specific details can be seen in Figure 6, which clearly presents the contributions and effects of each part, further proving the effectiveness of our proposed enhancement method in improving image quality. In contrast, our proposed method generates more obvious and accurate results and can segment images better in low-light conditions.

6. Conclusions

In this study, a semantic segmentation method based on a light-enhanced network for nighttime road scenes is proposed. In order to improve the performance, a nighttime road scene dataset, Trancity, is proposed. The Trancity dataset focuses on nighttime road scenes and provides high-quality fine labeling. This provides data support for subsequent segmentation network training. Then, we propose a light-enhanced network, which comprises two main parts: a color correction module and a parameter predictor. The color correction module mitigates the impact of illumination variations on the segmentation network by adjusting the color information of the image. Meanwhile, the parameter predictor accurately predicts the parameters of the image filter through the analysis of global content, including factors such as brightness, contrast, hue, and exposure level, thereby effectively enhancing the image quality. Our proposed method achieves an mIoU value of 59.4% on the Dark Zurich-test dataset, outperforming other semantic segmentation methods specifically designed for night scenes.

Author Contributions

Conceptualization and methodology, L.B., W.Z. and C.L.; validation, L.B., W.Z. and C.L.; formal analysis, L.B. and C.L.; investigation, X.Z.; resources, W.Z.; data curation, L.B.; writing—original draft preparation, L.B., W.Z. and C.L.; writing—review and editing, L.B., W.Z., X.Z. and C.L.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

We would like to express our sincere appreciation to the anonymous reviewers for their insightful comments, which have greatly aided us in improving the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. Defined side windows. Where [Forumla omitted. See PDF.] is the angle between the window and the horizontal line, [Forumla omitted. See PDF.] is the window radius, [Forumla omitted. See PDF.] controls the window length and [Forumla omitted. See PDF.]. (a) The definition of side window for continuous case. (b) The 8 types of windows defined as [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.].

Figure 1. Defined side windows. Where [Forumla omitted. See PDF.] is the angle between the window and the horizontal line, [Forumla omitted. See PDF.] is the window radius, [Forumla omitted. See PDF.] controls the window length and [Forumla omitted. See PDF.]. (a) The definition of side window for continuous case. (b) The 8 types of windows defined as [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.].

View Image - Figure 2. A selection of images from the Transcity dataset are shown. Where the first row is the corresponding daytime image, and the second row is the transformed nighttime image.

Figure 2. A selection of images from the Transcity dataset are shown. Where the first row is the corresponding daytime image, and the second row is the transformed nighttime image.

View Image - Figure 3. Overall framework of our proposed method. The three input images [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] are from the synthetic nighttime dataset Transcity, the daytime dataset Cityscapes and the real nighttime dataset ACDC-night, respectively, which are enhanced by the light-enhanced network. Then, all the outputs are fed into segmentation network to obtain the prediction results. Finally, the network is guided to accurately predict the category of each pixel by minimizing the segmentation loss.

Figure 3. Overall framework of our proposed method. The three input images [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] are from the synthetic nighttime dataset Transcity, the daytime dataset Cityscapes and the real nighttime dataset ACDC-night, respectively, which are enhanced by the light-enhanced network. Then, all the outputs are fed into segmentation network to obtain the prediction results. Finally, the network is guided to accurately predict the category of each pixel by minimizing the segmentation loss.

Figure 4. The structure of the light-enhanced network.

Figure 5. Comparison of different nighttime semantic segmentation algorithms on Dark Zurich-test.

Figure 6. Results plotted for parts on Dark Zurich-test dataset.

Table 1

The IoU (%) results of the different methods on Dark Zurich-test. The best results are bolded and sub-optimal results are underlined.

	IoU/%
Category	DMAda	GCMA	MGCDA	DANNet	DANIA	SFNet-N	IA	Our
road	75.5	81.7	80.3	90.0	90.8	94.3	93.3	93.6
sidewalk	29.1	46.9	49.3	54.0	59.7	74.0	70.0	70.8
building	48.6	58.8	66.2	74.8	73.7	79.4	80.6	81.1
wall	21.3	22	7.8	41	39.9	43.8	47.3	46.2
fence	14.3	20	11	21.1	26.3	31.9	28.7	33.3
pole	34.3	41.2	41.4	25	36.7	43.8	54.5	54.3
traffic light	36.8	40.5	38.9	26.8	33.8	57.9	49.6	48.9
traffic sign	29.9	41.6	39	30.2	32.4	50.1	53.7	55.3
vegetation	49.4	64.8	64.1	72.0	70.5	73.4	78.4	77.0
terrain	13.8	31	18	26.2	32.1	36.1	35.5	26.4
sky	0.4	32.1	55.8	84.0	85.1	85.5	89.4	88.2
person	43.3	53.5	52.1	47.0	43.0	60.6	57.8	58.9
rider	50.2	47.5	53.5	33.9	42.2	53.6	44.0	49.4
car	69.4	72.5	74.7	68.2	72.8	86.9	82.8	82.3
truck	18.4	39.2	66	19.0	13.4	8.2	33.7	89.1
bus	0.0	0.0	0.0	0.3	0.0	41.2	16.0	3.6
train	27.6	49.6	37.5	66.4	71.6	82.2	80.9	84.8
motorcycle	34.9	30.7	29.1	38.3	48.9	45.2	50.3	54.1
bicycle	11.9	21	22.7	23.6	23.9	33.7	34.2	32.8
mIoU (%)	32.1	42	42.5	44.3	47.2	56.9	56.9	59.4

Table 2

Ablation experiment results on Dark Zurich-test. The best results are bolded.

Number	Methods	Cityscapes	ACDC	Transcity	mIoU (%)	Inference Time (s)
A	baseline	√			26.5	0.407
B	—	√	√		30.9	0.409
C	—	√	√	√	31.8	0.412
D	w/parameter predictor	√	√	√	56.0	0.568
E	Ours	√	√	√	59.4	0.586

References

1. Wang, Y.; Zhang, J.; Chen, Y. An automated learning method of semantic segmentation for train autonomous driving environment understanding. IEEE Trans. Ind. Inf.; 2024; 20, pp. 6913-6922. [DOI: https://dx.doi.org/10.1109/TII.2024.3353874]

2. Wang, H.; Zhu, S.; Chen, L.; Li, Y.; Cai, Y. OccludedInst: An Efficient Instance Segmentation Network for Automatic Driving Occlusion Scenes. IEEE Trans. Emerging Top. Comput. Intell.; 2024; pp. 1-18. [DOI: https://dx.doi.org/10.1109/TETCI.2024.3414948]

3. Liang, J.; Li, Y.; Yin, G.; Xu, L.; Lu, Y.; Feng, J.; Shen, T.; Cai, G. A MAS-Based Hierarchical Architecture for the Cooperation Control of Connected and Automated Vehicles. IEEE Trans. Veh.; 2023; 72, pp. 1559-1573. [DOI: https://dx.doi.org/10.1109/TVT.2022.3211733]

4. Chen, Z.; Yan, R.; Ma, Y.; Sui, Y.; Xue, J. A smart status based monitoring algorithm for the dynamic analysis of memory safety. ACM Trans. Softw. Eng. Methodol.; 2024; 33, pp. 1-47. [DOI: https://dx.doi.org/10.1145/3637227]

5. Ahmed, A.; Abdelfatah, D. Enhancing security in X-ray baggage scans: A contour-driven learning approach for abnormality classification and instance segmentation. Eng. Appl. Artif. Intell.; 2024; 130, 107639. [DOI: https://dx.doi.org/10.1016/j.engappai.2023.107639]

6. Siddiquee, M.; Masudur, R. Machine Learning Approach for Spatiotemporal Multivariate Optimization of Environmental Monitoring Sensor Locations. Artif. Intell. Earth Syst.; 2024; 3, e230011. [DOI: https://dx.doi.org/10.1175/AIES-D-23-0011.1]

7. Akram, W.; Hassan, T.; Toubar, H.; Ahmed, M.; Miškovic, N.; Seneviratne, L.; Hussain, I. Aquaculture defects recognition via multi-scale semantic segmentation. Expert Syst. Appl.; 2024; 237, 121197. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.121197]

8. Wulfmeier, M.; Bewley, A.; Posner, I. Addressing appearance change in outdoor robotics with adversarial domain adaptation. Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Vancouver, BC, Canada, 24–28 September 2017; pp. 1551-1558.

9. Dai, D.; Gool, L.V. Dark model adaptation: Semantic image seg mentation from daytime to nighttime. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC); Maui, HI, USA, 4–7 November 2018; pp. 3819-3824.

10. Sakaridis, C.; Dai, D.; Van Gool, L. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7373-7382.

11. Sakaridis, C.; Dai, D.; Van Gool, L. Map-guided curriculum do main adaptation and uncertainty-aware evaluation for semantic night time image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.; 2022; 44, pp. 3139-3153. [DOI: https://dx.doi.org/10.1109/TPAMI.2020.3045882] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33338013]

12. Xu, Q.; Ma, Y.; Wu, J.; Long, C.; Huang, X. Cdada: A curriculum domain adaptation for nighttime semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW); Montreal, BC, Canada, 1–17 October 2021; pp. 2962-2971.

13. Romera, E.; Bergasa, L.M.; Yang, K.; Alvarez, J.M.; Barea, R. Bridging the day and night domain gap for semantic segmentation. Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV); Paris, France, 9–12 June 2019; pp. 1312-1318.

14. Sun, L.; Wang, K.; Yang, K.; Xiang, K. See clearer at night: Towards robust nighttime semantic segmentation through day-night image conversion. Proceedings of the Artificial Intelligence and Machine Learning in Defense Applications; Strasbourg, France, 9–12 September 2019; Volume 11169, pp. 77-89.

15. Wu, X.; Wu, Z.; Guo, H.; Ju, L.; Wang, S. Dannet: A one-stage domain adaptation network for unsupervised nighttime semantic segmentation. Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition(CVPR); Nashville, TN, USA, 20–25 June 2021; pp. 15764-15773.

16. Wu, X.; Wu, Z.; Ju, L.; Wang, S. A one-stage domain adaptation network with image alignment for unsupervised nighttime semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell.; 2023; 45, pp. 58-72. [DOI: https://dx.doi.org/10.1109/TPAMI.2021.3138829] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34962864]

17. Gao, H.; Guo, J.; Wang, G.; Zhang, Q. Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); New Orleans, LA, USA, 18–24 June 2022; pp. 9903-9913.

18. Tan, X.; Xu, K.; Cao, Y.; Zhang, Y.; Ma, L.; Lau, R. Night-time Scene Parsing with a Large Real Dataset. IEEE Trans Image Process.; 2021; 30, pp. 9085-9098. [DOI: https://dx.doi.org/10.1109/TIP.2021.3122004] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34705644]

19. Wang, H.; Chen, Y.; Cai, Y.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. SFNET-N: An improved sfnet algorithm for semantic segmentation of low-light autonomous driving road scenes. IEEE Trans. Intell. Transp. Syst.; 2022; 23, pp. 21405-21417. [DOI: https://dx.doi.org/10.1109/TITS.2022.3177615]

20. Li, C.; Zang, W.; Shao, Z.; Ma, L.; Wang, X. Semantic segmentation method on nighttime road scene based on Trans-nightSeg. J. Zhejiang Univ. Eng. Sci.; 2024; 58, pp. 294-303.

21. Liu, W.; Li, W.; Zhu, J.; Cui, M.; Xie, X.; Zhang, L. Improving nighttime driving-scene segmentation via dual image-adaptive learnable filters. IEEE Trans. Circuits Syst. Video Technol.; 2023; 33, pp. 5855-5867. [DOI: https://dx.doi.org/10.1109/TCSVT.2023.3260240]

22. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA, 27–30 June 2016; pp. 3213-3223.

23. Sakaridis, C.; Dai, D.; Van Gool, L. Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); Montreal, QC, Canada, 10–17 October 2021; pp. 10745-10755.

24. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM; 2020; 63, pp. 139-144. [DOI: https://dx.doi.org/10.1145/3422622]

25. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA, 21–26 July 2017; pp. 5967-5976.

26. Karacan, L.; Akata, Z.; Erdem, A.; Erdem, E. Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv; 2016; arXiv: 1612.00215

27. Chen, Y.; Lai, Y.-K.; Liu, Y.-J. Cartoongan: Generative adversarial networks for photo cartoonization. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA, 18–23 June 2018; pp. 9465-9474.

28. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell.; 2017; 39, pp. 640-651.

29. Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA, 21–26 July 2017; pp. 5168-5177.

30. Bijelic, M.; Gruber, T.; Ritter, W. Benchmarking image sensors under adverse weather conditions for autonomous driving. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV); Changshu, China, 26–30 June 2018; pp. 1773-1779.

31. Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Seattle, WA, USA, 13–19 June 2020; pp. 2633-2642.

32. Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Trans. Graph.; 2018; 37, pp. 1-17. [DOI: https://dx.doi.org/10.1145/3181974]

Word count: 6051

Show less

© 2024 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

To solve the semantic segmentation problem of night driving-scene images, which often have low brightness, low contrast, and uneven illumination, a nighttime driving-scene segmentation method based on a light-enhanced network was proposed. Firstly, we designed a light enhancement network, which comprises two parts: a color correction module and a parameter predictor. The color correction module mitigates the impact of illumination variations on the segmentation network by adjusting the color information of the image. Meanwhile, the parameter predictor accurately predicts the parameters of the image filter through the analysis of global content, including factors such as brightness, contrast, hue, and exposure level, thereby effectively enhancing the image quality. Subsequently, the output of the light enhancement network is input into the segmentation network to obtain the final segmentation prediction. Experimental results show that the proposed method achieves mean Intersection over Union (mIoU) values of 59.4% on the Dark Zurich-test dataset, outperforming other segmentation algorithms for nighttime driving-scenes.

Details

Title

A Nighttime Driving-Scene Segmentation Method Based on Light-Enhanced Network

Author

Bi, Lihua¹; Zhang, Wenjiao²; Zhang, Xiangfei²; Li, Canlin²

¹ School of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
² School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450000, China

First page

490

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20326653

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/wevj15110490

ProQuest document ID

3133395760

A Nighttime Driving-Scene Segmentation Method Based on Light-Enhanced Network

Jump to:

Full Text

1. Introduction

2. Related Work

2.1. Image Generation

2.2. Nighttime Driving-Scene Semantic Segmentation

3. Transcity Dataset

4. Methodology

4.1. Light-Enhanced Network

4.2. Color Correction Module

4.3. Parameter Predictor

4.4. Loss Function

5. Experimental

5.1. Evaluation Indicators

5.2. Datasets

5.3. Experimental Details

5.4. Performance Comparison on Dark Zurich-Test

5.5. Ablation Experiment

6. Conclusions

Abstract

Details

Suggested sources