This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Target tracking is one of the research hot spots in computer vision, and it has been widely used in military, unmanned driving, video monitoring, and other fields. The current target tracking algorithm [1] can be divided into three categories from the observation model: the method based on the generated model, the method based on the discriminant model, and the method based on deep learning.
The method based on the generative model is also called the classical target tracking algorithm. This method extracts the features of the target in the current frame, constructs the target model, and searches the best matching region with the appearance model in the next frame as the prediction position of the target. Typical representative algorithms are as follows: particle filter algorithm, mean shift algorithm, and Kalman filter algorithm. The method based on the discriminant model regards the target tracking problem as a classification or regression problem. In this method, the target is separated from the background by combining the background information with the feature extraction. TLD (tracking-learning-detection) algorithm [2] is the representative of a long-time tracking algorithm in this kind of method. In view of the target deformation, scale change, and occlusion in the process of long-time target tracking, TLD combines tracking with a traditional detection algorithm and updates the model and parameters online to make the tracking more robust and reliable. The target tracking method based on correlation filtering also belongs to the discriminant model method. Based on the minimum output sum of squared error (MOSSE) algorithm [3], correlation filtering is applied to target tracking for the first time. Through fast Fourier transform, the calculation is transferred from time domain to frequency domain, and the tracking speed is up to 615fps. The speed advantage of the target tracking algorithm based on correlation filtering shows its potential in target tracking. KCF [4] algorithm calculates the discriminant function by regression and introduces the cyclic shift method for approximate dense sampling. The kernel method is introduced to map the input to high-dimensional space, and hog feature is added to improve the tracking effect while maintaining fast calculation. SRDCF [5] introduces spatial regularization and weights the filter coefficients so that the filter coefficients are mainly concentrated in the central area, and the influence of boundary effects is alleviated.
In the method based on deep learning, C-COT [6] combines the shallow surface information and deep semantic information in-depth features, synthesizes the feature map information under multiple resolutions, interpolates the response map in the frequency domain, and then calculates the target position through iteration. SiamRPN [7] algorithm proposes a Siamese network structure based on RPN, which is composed of Siamese network and RPN network. Siamese network shares weights and maps the input to a new space to extract features. The RPN network generates candidate regions, which are used to distinguish the target background and fine-tune the candidate content to achieve end-to-end input and output. SiamMask [8] algorithm changes the previous rectangular box aligned with the coordinate axis to represent the target position, adds mask branch in Siamese network architecture, and generates a rotating rectangle through the target mask, which further improves the tracking accuracy.
Single object tracking (SOT) is the research content of the above target tracking methods. Different from single-object tracking, target tracking in practical application is more multiobject tracking [9] (MOT). The target is locked in the given video sequence, and each target is distinguished in the subsequent frame, and its motion trajectory is given. According to the initialization method of the target box, the multitarget tracking method is divided into two categories: DBT (detection-based tracking) and DFT (detection-free tracking). DFT needs to manually initialize the location box of the target, and it cannot deal with the new target problem in the video; DBT can detect new targets automatically and end the trajectory of the target leaving the visual field. In the multitarget tracking method, the key problem [10] is to detect the data association between nodes and existing trajectories and the correlation between trajectories. Xiang et al. [11] transformed the multitarget tracking problem into Markov decision process (MDP). The target trajectory is set to four different states, and the trajectory state and state transition process are described by MDP modeling and decision-making. Sort algorithm [12] uses Kalman filter algorithm to track the detected target, calculates the distance between IOU (intersection over union) measurement target frames, and performs optimal association matching through Hungarian algorithm. Deep sort algorithm [13] is improved on the basis of sort algorithm. Fast r-cnn is used to detect the target, and the Kalman filter is still used to track and predict the target. In distance measurement, Mahalanobis distance and the minimum cosine distance between the nearest depth feature set successfully tracked by the target and the feature vector of the detection result are integrated, and priority is assigned to the target through cascade matching. The problem of track association of target occlusion is solved. In the multitarget tracking method based on deep learning, Feng et al. [14] proposed a unified multitarget tracking framework. Siamrpn network is used for short-term target tracking, and the appearance characteristics of the long-term target are integrated. Reid network is used to improve the tracking stability when the target is occluded and deal with abnormal motion. Based on association matching, switch aware classification (SAC) is proposed to achieve a good multitarget tracking effect. However, due to the complexity of the model, the tracking speed is slow, which cannot meet the practical application.
It is still an important research task to track multitarget continuously and track accurately in complex traffic scenes. It is of great value to improve the utilization efficiency of traffic video monitoring data, timely and accurately to grasp road traffic information and regional road operation status. The cross-camera multitarget tracking can solve the problem that monocular camera cannot track accurately for a long time and a long distance, which lays an important foundation for the acquisition of wide-angle traffic information.
2. Principle of Multitarget Tracking
Traffic scene is a typical multitarget tracking application scene. This paper uses DBT detection target box to realize multitarget vehicle tracking in traffic scene. The process flow of multitarget tracking based on DBT is shown in Figure 1. The target detector will first detect the target in each frame of the video to obtain and identify multiple target positions. Multitarget tracking process is to associate the current detection result with the existing target track to extend the track.
[figure omitted; refer to PDF]
Next, we need to solve the problem of effective association between trajectory and target. In cross-camera multitarget tracking, the first step is to obtain the multitarget vehicle trajectory in a single camera. Referring to the latest research results of the team [15], the similarity between the target frames is calculated based on IOU, and the Hungarian algorithm is used to complete the association between the new detection node and the existing vehicle trajectory. The definition and delimitation method of the stage and state of the trajectory are proposed to better classify the trajectory. Then, through cross-camera vehicle tracking, the problem of 3D trajectory reconstruction based on combined camera calibration in the overlapping area is solved, as well as the similarity association and cross-camera trajectory update between cross-camera trajectories, and the trajectory transfer between adjacent cameras is completed.
3. Data Association Based on Cross-Camera Calibration
For the multicamera monitoring scene with the overlapping area, as shown in Figure 2. In a long area, there are many cameras. From the end with the smaller camera number in the monitoring area, renumber the cameras from 0 in turn. Each camera is responsible for monitoring a section of the Road area. In Figure 2, different color blocks are used to mark the monitoring area of each camera. There is a view overlap between adjacent cameras, and the overlap area is indicated by yellow. On the premise of cross-camera calibration, the similarity association can be completed by calculating the similarity matrix of vehicle trajectories between adjacent cameras. The basic idea is through the joint calibration of multiple cameras, and the cameras are unified in a world coordinate system, and the similarity matrix is calculated according to the Euclidean distance of the track points in the adjacent cameras in the world coordinate system.
[figure omitted; refer to PDF]
Figure 6 shows the successful matching of vehicles between adjacent cameras. When the target vehicle moves from the current camera to the next camera, the vehicle will be in the overlapping area of the two cameras. The successfully matched vehicle target ID needs to be unified, and the vehicle trajectory color will follow the initial color attribute. In Figure 7, when a black car is driven from camera 0 field of view to camera 1, the black car can be detected in both camera fields of view in the overlapping area. The two cars connected by the yellow line are the position of the black car under the two cameras. The target vehicle is matched in the overlapping area, and the vehicle information is transferred to camera 1.
[figure omitted; refer to PDF]
After the two scenes are calibrated across cameras, the vehicle trajectory can be drawn in the panoramic view of the cross-camera reconstruction of the surveillance scene. Taking the 70th frame photo of the multitarget vehicle tracking panoramic reconstruction image as an example, you can intuitively see the entire overtaking process of the vehicle under the two cameras, as shown in Figure 9. The result of this panoramic reconstruction allows a real overview of the operating state of the vehicle from a macro perspective and is not affected by the loss of the occluded trajectory. The reliability of the data is a major technological breakthrough.
[figure omitted; refer to PDF]
In order to further verify the effectiveness of the proposed method, the trajectory coincidence degree TC is used for description, and its definition formula is as follows:
Among them, m, n represents the number of discrete points on trajectory A and trajectory B, and
Table 1
Coincidence of trajectory under different cameras.
Trajectory coincidence degree | Camera 0 (%) | Camera 1 (%) | Restructure (%) |
— | 8 | 23 | 5 |
It can be seen from Table 1 that the method in this paper unifies the cameras in a world coordinate system for target tracking and association matching. Results: the coincidence degree between trajectories was the lowest in camera0 and camera1, and the effect of trajectory-based target behavior analysis was the same as that of observation from high altitude. So that the problem of occlusion overlap does not appear in the 2D image, and it can intuitively reflect the whole running state of the target in the large scene. Not only that, the proposed method also meets the real-time requirements.
6. Conclusion
Through the joint calibration between multiple cameras, the cameras are unified under a world coordinate system. The Euclidean distance between the trajectory nodes under the overlapping area at the same time is used to measure the similarity between the trajectories, and the trajectory association matrix is calculated to realize the matching between the real trajectory in the current camera and the new trajectory under the adjacent camera. Target tracking and association matching under single camera and cross-camera complete the trajectory transfer of the vehicle between adjacent cameras and realize the 3D bird’s-eye view reconstruction of the vehicle trajectory. The result proves that the operating state of the vehicle can be viewed from a real macro perspective, and the data are reliable, which is a major breakthrough. It makes the long-term and long-distance continuous tracking of multiple targets across cameras reliable and accurate.
Authors’ Contributions
Junfang Song and Tao Fan mainly engaged in image processing and artificial intelligence research. Huansheng Song mainly engaged in image processing and recognition and intelligent transportation systems research. Haili Zhao mainly engaged in image processing and information security research.
Acknowledgments
This work was supported by the Xizang Natural Science Foundation (nos. XZ202001ZR0065 G and XZ202001ZR0046 G), Major projects in Xizang University for Nationalities (no. 19MDZ03), and National Natural Science Funds (nos. 62041305, 62072053, and 62062061).
[1] L. Xi, Y. Cha, T. Zhang, "Overview of deep learning target tracking algorithms," Chinese Journal of image graphics, vol. 24 no. 12, pp. 2057-2080, 2019.
[2] Z. Kalal, K. Mikolajczyk, J. Matas, "Tracking learning detection," IEEE Transactions on Software Engineering, vol. 34 no. 7, pp. 1409-1422, 2011.
[3] D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, "Visual object tracking using adaptive correlation filters," vol. no. 6, pp. 13-18, DOI: 10.1109/cvpr.2010.5539960, .
[4] J. F. Henriques, R. Caseiro, P. Martins, J. Batista, "High-speed tracking with kernelized correlation filters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37 no. 3, pp. 583-596, DOI: 10.1109/tpami.2014.2345390, 2015.
[5] M. Danelljan, G. Hager, F. Shahbaz Khan, M. Felsberg, "Learning spatially regularized correlation filters for visual tracking," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4310-4318, DOI: 10.1109/iccv.2015.490, .
[6] M. Danelljan, A. Robinson, F. Shahbaz Khan, M. Felsberg, "Beyond correlation filters: learning continuous convolution operators for visual tracking," pp. 472-488, DOI: 10.1007/978-3-319-46454-1_29, .
[7] B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, "High performance visual tracking with siamese region proposal network," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971-8980, DOI: 10.1109/cvpr.2018.00935, .
[8] Q. Wang, L. Zhang, L. Bertinetto, W. Hu, P. H. S. Torr, "Fast online object tracking and segmentation: a unifying approach," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328-1338, DOI: 10.1109/cvpr.2019.00142, .
[9] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, T.-K. Kim, "Multiple object tracking: a literature review," Artificial Intelligence, vol. 293 no. 3,DOI: 10.1016/j.artint.2020.103448, 2021.
[10] P. Emami, P. M. Pardalos, L. Elefteriadou, S. Ranka, "Machine learning methods for solving assignment problems in multi-target tracking," 2020. arXiv https://arxiv.org/abs/1802.06897
[11] Y. Xiang, A. Alahi, S. Savarese, "Learning to track: online multi-object tracking by decision making," Proceedings of the IEEE international conference on computer vision, pp. 4705-4713, DOI: 10.1109/iccv.2015.534, .
[12] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, "Simple online and realtime tracking," pp. 3464-3468, DOI: 10.1109/icip.2016.7533003, .
[13] N. Wojke, A. Bewley, D. Paulus, "Simple online and realtime tracking with a deep association metric," pp. 3645-3649, DOI: 10.1109/icip.2017.8296962, .
[14] W. Feng, Z. Hu, W. Wu, J. Yan, W. Ouyang, "Multi-object tracking with multiple cues and switcher-aware classification," 2019. arXiv https://arxiv.org/abs/1901.06129
[15] Li Ying, Research on Traffic Video Intelligent Analysis System Based on Target Detection and Tracking, 2019.
[16] J. Song, H. Song, S. Wang, "PTZ camera calibration based on improved DLT transformation model and vanishing Point constraints," Optik, vol. 225 no. 7,DOI: 10.1016/j.ijleo.2020.165875, 2021.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Junfang Song et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In traffic scenarios, vehicle trajectories can provide almost all the dynamic information of moving vehicles. Analyzing the vehicle trajectory in the monitoring scene can grasp the dynamic road traffic information. Cross-camera association of vehicle trajectories in multiple cameras can break the isolation of target information between single cameras and obtain the overall road operation conditions in a large-scale video surveillance area, which helps road traffic managers to conduct traffic analysis, prediction, and control. Based on the framework of DBT automatic target detection, this paper proposes a cross-camera vehicle trajectory correlation matching method based on the Euclidean distance metric correlation of trajectory points. For the multitarget vehicle trajectory acquired in a single camera, we first perform 3D trajectory reconstruction based on the combined camera calibration in the overlapping area and then complete the similarity association between the cross-camera trajectories and the cross-camera trajectory update, and complete the trajectory transfer of the vehicle between adjacent cameras. Experiments show that the method in this paper can well solve the problem that the current tracking technology is difficult to match the vehicle trajectory under different cameras in complex traffic scenes and essentially achieves long-term and long-distance continuous tracking and trajectory acquisition of multiple targets across cameras.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 School of Information Engineering, Xizang Minzu University, Xianyang, Shaanxi 712082, China; Key Laboratory of Optical Information Processing and Visualization Technology of Tibet Autonomous Region, Xianyang, Shaanxi 712082, China
2 School of Information Engineering, Chang’an University, Xi’an 710064, China
3 School of Information Engineering, Xizang Minzu University, Xianyang, Shaanxi 712082, China