This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Traffic conditions in urban societies can be highly complex, since vehicles, pedestrians, and riders may be on the same road, especially in developing countries. In recent years, the rapid rise of bike-sharing and take-away industries has aggravated this phenomenon to a certain extent. The coexistence of vehicles, bikes, riders, and pedestrians has brought great challenges to driving safety in urban areas. The detection and classification of vehicles, bikes, riders, and pedestrians is essential for ICVs [1].
Vision-based object detection and classification is an important method to achieve traffic obstacle detection and classification. In traditional object detection methods, traditional machine learning methods such as scale-invariant feature transform (SIFT) [2] and histogram of oriented gradients (HOG) [3] extract the object features and input the extracted features into the classifiers like support vector machine (SVM) [4] and AdaBoost [5]. The design of these features can be very complicated; in particular, these features are handcrafted features, and their performances are task-dependent, which is not scalable to large-scale applications and can hardly be generalized. At this stage, traditional machine learning object detection methods can barely meet the requirements in practical applications; therefore, new target detection methods are needed. With the development of deep learning, many deep learning techniques have been applied to the field of object detection, among which the deep convolutional neural network (CNN) [6] is the most prominent one. Unlike traditional feature extraction algorithm relying on domain knowledge, CNN has shown to be invariant to geometric transformation, deformation, and illumination, thus effectively overcoming the difficulties caused by the variability of non-motorized vehicle appearance. It also can adaptively capture complex feature patterns by learning from data, leading to its high flexibility and generalization ability. Many deep learning-based object detection methods were proposed in recent years, including one-stage and two-stage detection methods, as shown in Figure 1 [7]. One-stage detection algorithms, such as YOLO [8], SSD [9], and Retina-Net [10], do not need to predict region proposals. In particular, they directly generate the label and the location of objects. After a single test, the final detection result can be obtained in an end-to-end manner, so that the detection speed is faster. In contrast, the two-stage detection algorithm divides the detection problem into two stages. Firstly, the region proposals are generated, and then the regional proposals are classified. In most cases, the predicted positions need to be refined. One typical example of the two-stage algorithms is the family of R-CNN algorithm, which is based on the region proposal, including R-CNN [11], SPPNet [12], Fast R-CNN [13], Faster R-CNN [14], and FPN [15].
[figure(s) omitted; refer to PDF]
Among all the aforementioned state-of-the-art object detection algorithms, YOLOv3 [16] and YOLOv4 [17] are arguably the most promising approaches. Proposed by Redmon et al. in 2018 and by Bochkovskiy et al. in 2020, YOLOv3 and YOLOv4 have both high detection speed and accuracy and can be used for the detection and classification of traffic obstacles. Researchers have conducted many research studies on the detection of traffic obstacles based on YOLO [18–22]. Wang et al. [18] used YOLOv3 to detect vehicles, pedestrians, and non-motor vehicles, which improved the detection accuracy. Narayanan et al. [19] proposed a model using HOG and YOLO algorithm for pedestrian detection in thermal images. Hung et al. [20] performed real-time obstacle detection with the YOLO model on an embedded system. Wang [21] proposed a real-time vehicle detection algorithm that integrates vision and lidar point cloud information, which achieved high detection accuracy and good real-time performance. Arvind et al. [22] developed a near-range obstacle sensing system based on vision sensor, which can ensure early detection and tracking of the obstacle. Zhang et al. [23] proposed a classification method for four classes of moving object using 3D point cloud, which recognized the moving objects effectively. Feng et al. [24] presented a 32-layer multibranch method for object detection in traffic scenes, which achieved the state-of-the-art performance. Li et al. [25] proposed an improved multivehicle detection method considering traffic flow, which achieved good performance and robustness. Wang et al. [26] presented a vision-based crash detection framework in mixed traffic flow environment, which achieved a high detection rate with relatively low false alarm rate. Cai et al. [27] presented an improved framework for object detection based on YOLOv4. Hnewa et al. [28] outlined the state-of-the-art frameworks for object detection under rainy conditions. Liu et al. [29] proposed a radar and camera information fusion method for object recognition. Bell et al. [30] presented a real-time system for night time vehicle detection. Satyanarayana et al. [31] proposed a vehicle method for heterogeneous and lane-less traffic. However, the above research seldom carried out on-vehicle real-time detection and classification of traffic obstacles based on the target characteristics of real hybrid traffic scenes, and the detection accuracy and real-time performance can be further improved.
In the task of traffic obstacle identification and classification, each misclassification is considered to be the same in terms of the potential costs it may bring. However, in actual applications, different misclassifications can result in significant different consequences for ICVs, and some may only lead to minor mistakes, while the others can bring disastrous consequences. To improve the safety of ICVs and avoid disastrous consequences caused by wrong predictions, one may need to assign different weights to different mislabelled results. Recently, the application of Wasserstein distance in object detection system has attracted much attention from the machine learning community [32]. Wasserstein distance [33] is a measure of distance between probability distributions, combining with which the loss function of YOLO could effectively reduce the probability of producing intolerable misclassification in ICVs, thereby reducing the security risk caused by misclassification.
In this paper, an improved Wasserstein distance loss is proposed based on the YOLO model. The main contributions of this paper can be summarized as follows:
(i) A new dataset, containing traffic obstacles including vehicles, bikes, riders, and pedestrians under different time periods and different weather conditions in urban environment in Wuhan, China, is collected and established for detection.
(ii) Based on YOLO network, the improved model is designed for traffic obstacle detection. The Wasserstein distance-based loss, which assigns different weights for one sample classified to different classes with different values, so that the misclassified objects are classified to similar classes with a higher probability, is combined with the loss function of YOLO to enhance the performance of traffic obstacle detection.
(iii) The improved model is deployed on NVIDIA TX2 for real-time detection and then compared with the original model. Empirical experiments show that the improved model presents more accurate and robust results than the original model, and its real-time performance can basically meet the requirements of real-time detection applications.
The remainder of this paper is organized as follows. In Section 2, the dataset collected in real scenes is described. Section 3 presents the Wasserstein loss-based YOLO model, including the network architecture of the designed model and the loss function for training it. The experimental results are reported in Section 4. Finally, the conclusions are presented in Section 5.
2. Dataset
2.1. Data Acquisition
In order to achieve accurate and efficient traffic obstacle detection, image data specifically for traffic obstacles including vehicle, bike, rider, and pedestrian were collected by a camera at a 1920
2.2. Data Classification
In the urban hybrid traffic scenario, vehicle, bike, rider, and pedestrian are the main traffic obstacles that affect the driving safety of intelligent and connected vehicles. Therefore, as shown in Figure 2, the detection objects in the collected data are divided into these four categories.
[figure(s) omitted; refer to PDF]
2.3. Data Augmentation
As shown in Figure 3, in order to enrich the dataset and enhance the robustness, data augmentation operations including rotation and brightness transformation were performed on the image data. The dataset after data augmentation contains a total of 2976 image data in hybrid traffic scenes.
[figure(s) omitted; refer to PDF]
2.4. Data Annotation
After the above processing, the dataset was manually labelled. In the images, objects with contours less than 50% and small targets that cannot be seen clearly were not labelled. The detailed sample size of each category that has been labelled is shown in Table 1.
Table 1
Detailed sample size of each category.
Category | Vehicle | Bike | Rider | Pedestrian |
Sample size | 10668 | 1440 | 5814 | 13182 |
3. Methodology
3.1. YOLO Model
In this paper, the YOLO-based detection models, including YOLOv3, YOLOv4, and YOLOv4-tiny, are established. In the YOLOv3 model [16], the image is divided into S
The network is mainly composed of a series of 1 x 1 and 3 x 3 convolutional layers (each convolutional layer is followed by a BN layer and a LeakyReLU layer). Three detections were performed in the network, which were performed during 32 times downsampling (2^5), 16 times downsampling (2^4), and 8 times downsampling (2^3). After the 79th layer of the convolutional network, it passes through several convolutional layers to obtain a scale of detection results. Compared with the input image, the feature map used for detection here has 32 times downsampling. Due to the high downsampling factor, the receptive field of the feature map here is relatively large, so it is suitable for detecting objects of relatively large size in the image data. In order to achieve fine-grained detection, start sampling from the feature map of the 79th layer and then fuse it with the feature map of the 61st layer (concatenation) to obtain a fine-grained feature map of the 91st layer, which also passes through several convolutional layers, and then get a 16 times downsampled feature map relative to the input image, which has a medium-scale receptive field and is suitable for detecting medium-scale objects. Finally, the 91st layer feature map is upsampled again and fused with the 36th layer feature map to obtain a feature map that is downsampled 8 times relative to the input image. It has the smallest receptive field and is suitable for detecting small-sized objects.
YOLOv4 [17] has made a series of improvements on the basis of YOLOv3, mainly including the following: the backbone feature extraction network is changed from DarkNet53 to CSPDarkNet53 [34], the feature pyramid is changed to SPP [35] and PAN [36], the classification regression layer is unchanged for YOLOv3, etc.
The YOLOv4-tiny network structure is a simplified version of YOLOv4, which is a lightweight model with only 6 million parameters equivalent to one-tenth of the original. As shown in Figure 4, the overall network structure has 38 layers, using three residual units, the activation function uses LeakyReLU, the classification and regression of the target are changed to use two feature layers, and the feature pyramid network (FPN) is used when merging the effective feature layers. It also uses the CSPnet structure, performs channel segmentation on the feature extraction network, divides the feature layer channel output after 3x3 convolution into two parts, and takes the second part. The detection speed of the YOLOv4-tiny model has been greatly improved, which makes it possible to be deployed on mobile embedded terminals such as NVIDIA TX2 for real-time detection.
[figure(s) omitted; refer to PDF]
3.2. Wasserstein Distance-Based Loss
To alleviate the undesirable consequences caused by misclassification, we propose to incorporate the Wasserstein distance into the framework of YOLO and apply it to ICVs. The Wasserstein distance is a metric for measuring the discrepancy or dissimilarity between probability measures, and it calculates the cost of moving one distribution to another one [37]. For discrete distributions
In object detection, we consider the source distribution
[figure(s) omitted; refer to PDF]
Denote by
4. Experimental Results
4.1. Experimental Environment
The experiments were trained and tested on a Windows PC with two Intel Xeon processors, a CPU at 3.5 GHz, 128 G DDR4, and an NVIDIA GeForce RTX 2080 with 8 GB memory. The established dataset is divided into training set and test set at a ratio of 9 : 1. During training, all but three output layers were first frozen to get a stable loss and then unfrozen, and training was continued to fine-tune. To avoid overfitting, when the loss cannot be reduced within ten epochs, training is terminated. In addition, the original and improved YOLO models were performed on NVIDIA TX2 for real-time detection.
4.2. Evaluation Metric
In this study, the precision-recall curve (P-R curve), F1 score, and mean average precision (mAP) were used to evaluate the performance of the model.
The P-R curve is a curve composed of the value of precision (P) as the ordinate and the value of recall (R) as the abscissa, where P can be defined as
R can be defined as
Table 2
Definition of different detection results.
Labelled | Detected | Definition |
Positive | Positive | TP |
Positive | Negative | FN |
Negative | Positive | FP |
Negative | Negative | TN |
F1 score, an index that comprehensively considers the values of P and R to reflect the performance of the detection model, can be defined as
The area under the P-R curve is the value of the average precision (AP), and the AP value over four categories of the obstacle objects in the hybrid traffic scene is defined as mAP. The AP and mAP value can be defined as
4.3. Result of Designed Models on Established Dataset
In order to verify the detection effect of the designed models, the models including YOLOv3, YOLOv4, and YOLOv4-tiny were performed on the four categories of obstacle objects. The loss curves of the designed models are shown in Figures 6, 7, and 8, respectively.
[figure(s) omitted; refer to PDF]
It can be seen from the loss curves that the loss value of the improved model is higher than that of the original model at the beginning of training, and the loss value of the improved model and original model is basically the same when the loss value stabilizes. This is because of the addition of Wasserstein distance-based loss to the improved model. The final loss values of the YOLOv3, YOLOv4, and YOLOv4-tiny models are about 24.5, 10, and 11.5, respectively.
The experimental results of designed models are shown in Table 3, and the P-R curves are shown in Figure 9. It can be seen from the experimental results that the mAP of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models is 98.57%, 98.19%, and 80.39%, respectively, slightly higher than that of each original model, and the F1 value of the improved models is basically the same as each original model.
Table 3
The experimental results.
YOLOv3 | Improved YOLOv3 | |||
AP | F1 | AP | F1 | |
Vehicle | 99.30% | 0.97 | 99.43% | 0.97 |
Bike | 99.36% | 0.99 | 99.35% | 0.99 |
Rider | 98.16% | 0.95 | 98.39% | 0.95 |
Pedestrian | 96.66% | 0.93 | 97.09% | 0.94 |
mAP | 98.37% | 98.57% | ||
YOLOv4 | Improved YOLOv4 | |||
AP | F1 | AP | F1 | |
Vehicle | 98.66% | 0.96 | 99.00% | 0.96 |
Bike | 99.49% | 0.97 | 99.33% | 0.96 |
Rider | 97.44% | 0.95 | 97.52% | 0.95 |
Pedestrian | 97.09% | 0.93 | 96.91% | 0.93 |
mAP | 98.17% | 98.19% | ||
YOLOv4-tiny | Improved YOLOv4-tiny | |||
AP | F1 | AP | F1 | |
Vehicle | 86.98% | 0.83 | 86.77% | 0.83 |
Bike | 83.05% | 0.85 | 83.47% | 0.85 |
Rider | 80.19% | 0.79 | 80.44% | 0.78 |
Pedestrian | 70.55% | 0.72 | 70.89% | 0.72 |
mAP | 80.19% | 80.39% |
[figure(s) omitted; refer to PDF]
4.4. Result of Designed Models on BDD Dataset
BDD is one of the latest published autonomous driving datasets with dense traffic scenes, on which the detection effect of the designed models is also verified. In the BDD dataset, there are few objects in the bike and rider categories, so we selected the data containing these two categories of objects for testing to maintain the relative balance between the various categories. The experimental results of designed models are shown in Table 4, and the P-R curves are shown in Figure 10.
Table 4
The experimental results.
YOLOv3 | Improved YOLOv3 | |||
AP | F1 | AP | F1 | |
Vehicle | 97.70% | 0.95 | 97.80% | 0.95 |
Bike | 81.27% | 0.86 | 81.55% | 0.87 |
Rider | 97.10% | 0.92 | 96.97% | 0.92 |
Pedestrian | 95.37% | 0.92 | 95.54% | 0.92 |
mAP | 92.86% | 92.97% | ||
YOLOv4 | Improved YOLOv4 | |||
AP | F1 | AP | F1 | |
Vehicle | 97.13% | 0.94 | 97.07% | 0.94 |
Bike | 77.37% | 0.84 | 77.82% | 0.84 |
Rider | 95.60% | 0.91 | 95.46% | 0.92 |
Pedestrian | 94.42% | 0.91 | 94.57% | 0.91 |
mAP | 91.13% | 91.23% | ||
YOLOv4-tiny | Improved YOLOv4-tiny | |||
AP | F1 | AP | F1 | |
Vehicle | 86.56% | 0.82 | 86.57% | 0.83 |
Bike | 74.60% | 0.78 | 75.44% | 0.78 |
Rider | 79.52% | 0.78 | 79.51% | 0.78 |
Pedestrian | 70.37% | 0.72 | 70.34% | 0.72 |
mAP | 77.76% | 77.97% |
[figure(s) omitted; refer to PDF]
It can be seen from the experimental results that the mAP of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models is 92.97%, 91.23%, and 77.97%, respectively, higher than that of each original model, and the F1 value of the improved models is basically the same as each original model. The detection mAP value of the designed model on the BDD dataset is slightly lower than that on the established dataset. This is because the training of the model is carried out on the training set in the established dataset, which is similar to the testing set scene but different from the BDD dataset scene. However, the detection results on both datasets could meet the basic application requirements.
4.5. The Application-Oriented Performance on NVIDIA TX2
NVIDIA TX2 is a mobile terminal that can be deployed directly on the vehicle. The vehicle application scenarios on NVIDIA TX2 are shown in Figure 11. The trained improved and original models are deployed on NVIDIA TX2, respectively, and then tested on the established dataset. In addition, the NVIDIA TX2 with a camera is installed on the vehicle for real-time detection to verify the detection effect and real-time performance of the proposed model.
[figure(s) omitted; refer to PDF]
The detection speed of different models is shown in Table 5. As can be seen from the table, the detection speed of the improved YOLOv3 and YOLOv4 models on NVIDIA TX2 is between 3 fps and 4 fps, while on Windows PC, it is between 8 fps and 9 fps, which is a little poor in real-time performance. The detection speed of the improved YOLOv4-tiny model on NVIDIA TX2 is above 22 fps, while on Windows PC, it is above 27 fps, which can basically realize the real-time detection of traffic obstacles.
Table 5
Detection speed of different models.
Detection model | Windows PC | ||
Speed (fps) | YOLOv3 | YOLOv4 | YOLOv4-tiny |
Model | |||
Original model | 8.6786 | 8.0926 | 27.7137 |
Improved model | 8.8412 | 8.2137 | 27.7925 |
Detection model | NVIDIA TX2 | ||
Speed (fps) | YOLOv3 | YOLOv4 | YOLOv4-tiny |
Model | |||
Original model | 3.8385 | 3.3939 | 22.4285 |
Improved model | 3.8586 | 3.4049 | 22.5928 |
The real-time detection effect of the improved YOLOv4-tiny model was verified on the NVIDIA TX2 and compared with the original YOLOv4-tiny model. As shown in Figure 12, some misclassifications detected by the original model can be effectively and correctly classified by the improved model, proving that the improved model can effectively reduce intolerable misclassifications between different categories.
[figure(s) omitted; refer to PDF]
5. Conclusions
In this paper, an improved YOLO model for traffic obstacle detection and classification applied in ICVs is presented. A new dataset containing traffic obstacles collected under different time periods and different weather conditions in urban environment was established. The improved models, which reduce the intolerable misclassification and enhance the performance of traffic obstacle detection by combining the Wasserstein distance-based loss with the YOLO models, were designed and implemented. The improved model was trained and then tested on established dataset and selected BDD dataset and deployed on NVIDIA TX2 for real-time detection.
Experimental results showed that the mAP values of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models are 98.57%, 98.19%, and 80.39%, respectively, higher than those of each original model. From the application-oriented performance on NVIDIA TX2, the detection speed of the improved YOLOv4-tiny model is 22.5928 fps, which is much better than that of the YOLOv3 and YOLOv4 models and basically meets the real-time detection requirements of traffic obstacles. In addition, in the real-time vehicle verification, the improved YOLOv4-tiny model can reduce the intolerable misclassifications between different categories more effectively than the original model. In practical applications, the improved model could effectively improve the accuracy of decision making for ICVs, thereby improving the driving safety. In the future study, the dataset could be enriched and the detection model could be further optimised.
Acknowledgments
This research was supported in part by the National Key R & D Program of China under grant no. 2018YFB0105205 and in part by the Hubei Province Technological Innovation Major Project under grant no. 2019AAA025.
[1] K. Q. Li, Y. F. Dai, S. B. Li, M. Y. Bian, "State-of-the-art and technical trends of intelligent and connected vehicles," Journal of Automotive Safety and Energy, vol. 8 no. 1, 2017.
[2] X. Y. Ma, W. E. L. Grimson, "Edge-based rich representation for vehicle classification," Proceedings of the 10th IEEE International Conference on Computer Vision, pp. 1185-1192, .
[3] Y. Taigman, M. Yang, M. Ranzato, "Deepface: closing the gap to human-level performance in face verification," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701-1708, .
[4] F. M. Kazemi, S. Samadi, H. R. Poorreza, "Vehicle recognition using curvelet transform and svm," Proceedings of the 4th IEEE International Conference on Information Technology, pp. 516-521, .
[5] Y. K. L. Lai, Y. H. C. Chou, T. Schumann, "Vehicle detection for forward collision warning system based on a cascade classifier using adaboost algorithm," Proceedings of the 7th IEEE International Conference on Consumer Electronics, pp. 47-48, .
[6] J. Long, E. Shelhamer, T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39 no. 4, 2017.
[7] Z. X. Zou, Z. W. Shi, Y. H. Guo, J. P. Ye, "Object detection in 20 years: a survey," 2019. https://arxiv.org/abs/1905.05055
[8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You only look once: unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, .
[9] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, "Ssd: single shot multibox detector," pp. 21-37, .
[10] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, "Focal loss for dense object detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42 no. 2, pp. 318-327, DOI: 10.1109/tpami.2018.2858826, 2020.
[11] R. Girshick, J. Donahue, T. Darrell, J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, .
[12] K. He, X. Zhang, S. Ren, J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," pp. 346-361, .
[13] R. Girshick, "Fast r-cnn," Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, .
[14] S. Ren, K. He, R. Girshick, J. Sun, "Faster r-cnn: towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015.
[15] T. Y. Lin, P. Doll´ar, R. B. Girshick, K. He, B. Hariharan, S. J. Belongie, "Feature pyramid networks for object detection," vol. 1 no. 2, .
[16] J. Redmon, A. Farhadi, "YOLOv3: an incremental improvement," 2016. https://arxiv.org/abs/1804.02767
[17] A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, "YOLOv4: optimal speed and accuracy of object detection," 2020. https://arxiv.org/abs/2004.10934
[18] S. Y. Wang, T. Ahmad, "A real-time detection method of traffic targets based on YOLO," Computer & Digital Engineering, vol. 48 no. 9, pp. 2162-2167, 2020.
[19] A. Narayanan, R. Darshan Kumar, R. RoselinKiruba, T. Sree Sharmila, "Study and analysis of pedestrian detection in thermal images using YOLO and SVM," Proceedings of the 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 431-434, .
[20] S. H. Hung, K. W. Chen, C. H. Chen, H. T. Chou, C. Y. Yao, "Real-time obstacle detection on embedded system," .
[21] H. Wang, X. Lou, Y. Cai, Y. Li, L. Chen, "Real-time vehicle detection algorithm based on vision and lidar point cloud fusion," Journal of Sensors, vol. 2019,DOI: 10.1155/2019/8473980, 2019.
[22] C. S. Arvind, R. Jyothi, K. Mahalakshmi, C. K. Vaishnavi, U. Apoorva, "Vision based driver assistance for near range obstacle sensing under unstructured traffic environment," Proceedings of the Proceedings of 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1163-1170, .
[23] M. Zhang, R. Fu, Y. Guo, L. Wang, "Moving object classification using 3D point cloud in urban traffic environment," Journal of Advanced Transportation, vol. 2020,DOI: 10.1155/2020/1583129, 2020.
[24] J. Feng, F. Wang, S. Feng, Y. Peng, "A multibranch object detection method for traffic scenes," Computational Intelligence and Neuroscience, vol. 2019,DOI: 10.1155/2019/3679203, 2019.
[25] X. Li, Y. Liu, Z. Zhao, Y. Zhang, L. He, "A deep learning approach of vehicle multitarget detection from traffic video," Journal of Advanced Transportation, vol. 2018,DOI: 10.1155/2018/7075814, 2018.
[26] C. Wang, Y. Dai, W. Zhou, Y. Geng, "A vision-based video crash detection framework for mixed traffic flow environment considering low-visibility condition," Journal of Advanced Transportation, vol. 2020,DOI: 10.1155/2020/9194028, 2020.
[27] Y. Cai, T. Luan, H. Gao, H. Wang, L. Chen, Y. Li, M. A. Sotelo, Z. Li, "YOLOv4-5D: an effective and efficient object detector for autonomous driving," IEEE Transactions on Instrumentation and Measurement, vol. 70,DOI: 10.1109/tim.2021.3065438, 2021.
[28] M. Hnewa, H. Radha, "Object detection under rainy conditions for autonomous vehicles: a review of state-of-the-art and emerging techniques," IEEE Signal Processing Magazine, vol. 38 no. 1, pp. 53-67, DOI: 10.1109/msp.2020.2984801, 2021.
[29] Z. Liu, Y. Cai, H. Wang, L. Chen, H. Gao, Y. Jia, Y. Li, "Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions," IEEE Transactions on Intelligent Transportation Systems,DOI: 10.1109/TITS.2021.3059674, 2021.
[30] A. Bell, T. Mantecon, C. Diaz, C. R. del-Blanco, F. Jaureguizar, N. Garcia, "A novel system for nighttime vehicle detection based on foveal classifiers with real-time performance," IEEE Transactions on Intelligent Transportation Systems,DOI: 10.1109/TITS.2021.3053863, 2021.
[31] G. S. R. Satyanarayana, S. Majhi, S. K. Das, "A vehicle detection technique using binary images for heterogeneous and lane-less traffic," IEEE Transactions on Instrumentation and Measurement, vol. 70,DOI: 10.1109/tim.2021.3062412, 2021.
[32] M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein generative adversarial networks," Proceedings of the International Conference on Machine Learning (ICML), pp. 214-223, .
[33] C. Villani, Optimal Transport: Old and New, vol. 338, 2008.
[34] C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh, "CSPNet: a new backbone that can enhance learning capability of cnn," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), .
[35] K. He, X. Zhang, S. Ren, J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37 no. 9, pp. 1904-1916, DOI: 10.1109/tpami.2015.2389824, 2015.
[36] S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia, "Path aggregation network for instance segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759-8768, DOI: 10.1109/cvpr.2018.00913, .
[37] N. Courty, R. Flamary, D. Tuia, A. Rakotomamonjy, "Optimal transport for domain adaptation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39 no. 9, pp. 1853-1865, 2016.
[38] G. Peyré, M. Cuturi, "Computational optimal transport," Foundations and Trends in Machine Learning, vol. 11 no. 5-6, pp. 355-607, 2019.
[39] N. Bonnotte, Unidimensional and Evolution Methods for Optimal Transportation, vol. 11, 2013. Ph. D. thesis
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Luyao Du et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Mixed traffic is a common phenomenon in urban environment. For the mixed traffic situation, the detection of traffic obstacles, including motor vehicle, non-motor vehicle, and pedestrian, is an essential task for intelligent and connected vehicles (ICVs). In this paper, an improved YOLO model is proposed for traffic obstacle detection and classification. The YOLO network is used to accurately detect the traffic obstacles, while the Wasserstein distance-based loss is used to improve the misclassification in the detection that may cause serious consequences. A new established dataset containing four types of traffic obstacles including vehicles, bikes, riders, and pedestrians is collected under different time periods and different weather conditions in urban environment in Wuhan, China. Experiments are performed on the established dataset on Windows PC and NVIDIA TX2, respectively. From the experimental results, the improved YOLO model has higher mean average precision than the original YOLO model and can effectively reduce intolerable misclassifications. In addition, the improved YOLOv4-tiny model has a detection speed of 22.5928 fps on NVIDIA TX2, which can basically realize the real-time detection of traffic obstacles.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details






1 School of Automation, Wuhan University of Technology, Wuhan 430070, China
2 Department of Computer Science, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
3 School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
4 Wuhan Zhongyuan Electronics Group Co., Ltd., Wuhan 430205, China