This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Car sharing services provide customers access to shared vehicles for short-term use. They can reduce inner-city traffic, trip cost, congestion, and environmental pollution and have developed rapidly in recent years. To achieve better safety and operating efficiency, more and more intelligent vehicle technologies have been utilized in car sharing services [1, 2]. Visual object tracking is a fundamental component of them, by which given an object’s initial location in the first frame its locations in subsequent frames can be estimated continually. Moreover, the object’s trajectories and velocities can be calculated simultaneously from the tracking results and used for augmented or automatic driving of shared vehicles. Compared with radar tracking, visual tracking technology is cheaper and can perceive richer semantic information about the traffic scene. However, its disadvantage is that there exist several factors such as the real-time variation of illumination, weather condition, and interaction between traffic elements which may usually reduce the object tracking performance in complex traffic scenes. Therefore, there is still huge room for the development of visual object tracking for car sharing services.
A typical visual object tracking method consists of five components, namely, feature extraction, motion model, appearance model, model updating, and integration process [3]. Most studies focus on feature extraction and appearance model. The features used for object tracking include hand-craft features such as Color, HOG, LBP, and CN and autolearned convolution features. The main appearance models can be classified to generative and discriminant ones and receive much attention. By contrast, the model updating component is less studied. Most object trackers use the simplest linear weighting for model updating, in which a new appearance model is obtained by weighting the old one and the tracking result of the current frame. The drawback of this method is that the weight factor of the current frame is set unchanged and has no connection with the tracking performance of the current frame during the updating process. In fact, if the tracking result of the current frame is reliable and the object is not occluded, a small weight factor of current tracking result may cause the appearance model not to be updated adequately. On the contrary, if the tracking result of the current frame is inaccurate or the object is occluded, a large weight factor of current tracking result may cause the appearance model to be updated improperly. Under both situations, some errors may be introduced into the appearance model, and as the updating proceeds, the errors may accumulate and make the appearance model drift away from the object. From the above analysis, we can find that it is necessary to assign a suitable weight factor according to the evaluation of current tracking performance. Nonetheless, how to update the tracking model online based on the analysis of current tracking performance is still an open problem. This study tries to bridge this research gap, and the main contributions are follows:
(1) Introduce an object-specific IOU predictor which trained offline on a large number of image pairs to estimate the performance of current tracking result for object model updating.
(2) Propose a dynamic updating mechanism based on IOU prediction. The updating principle is to assign the current tracking result a larger weight if it is relatively accurate and unbroken and a smaller weight on the contrary.
(3) Integrate the IOU predictor into a lightweight correlation filter tracker and update the tracker online using the proposed updating mechanism.
This paper is organized as follows: Section 2 provides a scan of related works. Section 3 introduces a baseline object tracker and the IOU predictors used in computer vision and proposes our visual object tracker with online updating. Section 4 shows the experimental results and corresponding analysis. Finally, Section 5 presents the conclusions and future research directions.
2. Related Work
As mentioned above, existing visual object tracking methods can be divided into two categories: generative ones and discriminant ones. In the generative methods, the appearance model contains only the object’s information and object tracking is achieved by searching for the optimal candidate region that best match the appearance model. Template tracking is the earliest generative tracking method, which takes the original spatial intensity distribution of the object region as the template and tracks the object by template matching. Aiming at the drift problem caused by inadequate updating of templates in tracking, Matthews et al. [4] kept the first template around, used it to align the current template, and finally reduced the possible drift phenomenon to a certain extent. As another classical generative tracking method, the mean shift method [5] takes the object’s kernel histogram in the first frame as the appearance model and employs a metric which derived from the Bhattacharyya coefficient as the similarity measure to perform the matching. Throughout the whole tracking process above, the appearance model remained unchanged. To update the appearance model dynamically, Peng et al. [6] employed Kalman filter to filter the kernel histogram using the previous appearance model and current candidate region. The modified method could partly keep up with the changing of object appearance, but the hidden assumption that the object appearance obeyed the Gaussian distribution may not hold in many practical situations. Besides the intensity template and the kernel histogram, low-dimensional linear subspace is also a generative appearance model and first introduced into object tracking by Hager and Belhumeur [7] to handle the object appearance’s variation caused by illumination. To update the linear subspace model adaptively, Ross et al. [8] proposed an incremental learning-based tracking method. It collected the object locations in previous frames and employed incremental PCA to update the linear subspace model. Through the updating operation, the linear subspace model could adapt to the variation of object appearance even more.
Different from the generative object tracking methods, the discriminant methods consider not only the objects’ information but also the backgrounds’ information for tracking. They take object tracking as a binary classification problem, train a classifier to separate the object from the background, and have attracted more attention due to their strength to deal with the objects under complex environments. Most traditional tracking-by-detection methods train their binary classifiers online to update the appearance model, and the updating process always has two steps: (i) the generation and labelling of samples based on the estimated object locations in previous frames and (ii) the online updating of the classifiers [9]. However, the generated samples’ labels are often noisy. To increase the classifier’ robustness to the poorly labelled samples, several improvements such as robust loss functions [10, 11], semisupervised learning [12, 13], and multiple instance learning [14, 15] have been proposed.
With the fast development of deep learning, modern visual object trackers such as correlation filtering-based trackers and siamese trackers generally use deep features to build their appearance models, and the corresponding model updating mechanisms have also been studied. MOSSE filter [16] the first correlation filtering-based tracker updates the object model by weighting the current estimated object region and the previous object model linearly, and the linear weighting method has been also used in many other correlation filtering-based trackers [17–20]. Siamese tracker is another kind of modern object tracker, whose basic principle is to learn a similarity metric offline and search online for an optimal candidate region which best matches the object appearance template. SiamFC is the original siamese tracker, in which the object template is initialized in the first frame and then kept fixed during the remainder of the video [21]. Most siamese trackers [22–24] implement the same model updating strategy as the one in MOSSE, and there are two problems in the updating of these trackers. First, the weight factors of current frame are set fixed and cannot change adaptively in the updating process. Second, only the object information is updated, and the updating of the background information is ignored. Aiming at the second limitation, Huang et al. [25] modeled the context between the object and its surroundings by an object-aware weight vector and took the spatial-temporal context into account in the updating process. Besides the above, there are some learning-based model updating methods. Taking the initial template, the accumulated template, and the template of the current frame as inputs, Zhang et al. [26] utilized a convolutional neural network to learn the optimal template of the next frame in an offline way. Li et al. [27] learned a RNN-based model updater on offline videos by metalearning. In general, to make the learned mechanism adapt to arbitrary targets, a large number of samples with different kinds of appearance variation are needed for these learning-based model updating methods.
To consider the feedback from tracking results in object model updating, Wang et al. [28] used the response map’s peak value and average peak-to-correlation energy (APCE) to measure the confidence of current tracking result. The object model was only updated if these indexes were greater than certain thresholds and remained unchanged if not. Similar to the above method, Sun et al. [29] calculated peak-to-sidelobe ratio (PSR) of response map to evaluate the quality of tracking result in each frame and used it to update the template of a siamese tracker. In addition, Zhu et al. [30] took peak-versus-noise ratio (PNR) as an evaluation index. When the PNR and the max value of response map exceeded certain thresholds simultaneously, one-step stochastic gradient descent with a small learning rate was used to update the object model.
In summary, most modern object trackers update their appearance models without considering whether the estimated object location is accurate or not. Actually, once the object is estimated inaccurately, severely occluded, or totally missing in the current frame, the object model will be updated improperly, and the impact will accumulate continually during the whole tracking. Few research studies used APCE, PSR, or PNR to measure the confidence of current tracking result. These rule-based indicators can be calculated from the response map easily and rapidly, but a lot of information in raw images is thrown away in the calculation. Therefore, they are limited in the evaluation of tracking performance. Different from them, in this paper, we introduce a data-based method to evaluate the performance of tracking results and use it as a guidance to update the object model online. For the reader’s reference, Table 1 summarizes some main symbols and their corresponding descriptions used in the following.
Table 1
Symbols summary.
Target region | |
Response of correlation filer | |
Search region | |
Correlation filter | |
Response map of correlation filter | |
Feature extraction network | |
Parameters of | |
Object function of | |
Size of extracted feature | |
Accumulated ridge loss | |
Updating rate at time | |
Regularization coefficient | |
Discrete Fourier transform | |
Bounding box | |
Modulation vector | |
Feature representation of test image | |
IOU predictor module | |
Predicted IOU of bounding box | |
IOU thresholds | |
Predefined updating rates |
3. Object Tracking with Online Updating Guided by IOU
3.1. Base Object Tracker
The features used in traditional discriminant correlation filtering-based object tracking methods are either hand-crafted features like HOG, LBP, and CN or convolutional features trained independently in other visual tasks like image classification and object detection. The separation between feature learning and correlation tracking makes the achieved tracking performance not be optimal. Aiming at this problem, Wang et al. [20] proposed DCFNet which is an end-to-end lightweight network architecture to learn the convolutional features and perform the correlation tracking process simultaneously. Because of its high efficiency and performance, we use it as the base object tracker in this work.
In DCFNet,
Here, hat
The feature extraction network
3.2. IOU Prediction
IOU is defined as the ratio of the intersection area between the candidate object region and the ground truth region to the union area of them. It evaluates the accuracy of the candidate region relative to the ground truth region and is useful in many visual tasks. The prediction of IOU is first implemented by IOU-Net [31] in object detection, in which each IOU-Net is trained for a certain object class independently but not suitable for other sorts of objects. However, the class-specific IOU predictors are of little use for generic visual tracking because the object’s class is generally unknown and arbitrary in object tracking. To predict the candidate region’s IOU of all sorts of objects in visual tracking, Danelljan et al. [32] proposed a new IOU predictor which could predict an arbitrary object’s IOU given only a single reference image by a modulation-based network architecture as shown in Figure 1.
[figure omitted; refer to PDF]
As shown in Figure 1, the IOU predictor network has two branches, and both of them take the specific convolution layers of ResNet-18 as backbone. The reference branch accepts convolution feature
3.3. Online Updating of the Base Object Tracker with the Guidance of IOU
As many discriminant correlation filtering-based object trackers, DCFNet has an incremental model updating mechanism as shown in equation (2). In the updating process, the parameter
The assumption behind equation (6) is that the estimated object regions in different frames are of equal importance. Obviously, it does not hold in many cases. For example, if the object is occluded at time
In fact, the evaluation of tracking performance has received certain attention and been used for model updating in visual tracking. In most existing methods, the reliability of tracking result is expressed as statistical indexes such as APCE, PSR, PNR, and so on. These statistical indexes are defined manually and calculated based on an intermediate response map. In the evaluation, the original information contained in tracking results such as color, texture, and intensity are ignored. Different from them, we introduce the IOU to measure the reliability of tracking result and use it as a guidance to update the object tracker online. As shown in Figure 2, the original DCFNet is supplemented with an IOU predictor to constitute a new tracker and in which the architectures of two networks are remained unchanged.
[figure omitted; refer to PDF]
Because of the prediction error, it is hard and unnecessary to adjust the parameter
Compared with the manually defined statistical indexes such as APCE, PSR, and PNR, the predicted IOU between the estimated object region
4. Experiments
4.1. Experiment Settings
To verify the effectiveness of the proposed object tracker, we first conduct extensive experiments on 2 challenging public datasets including OTB-2013 [33] with 50 sequences and its updated version OTB-2015 [34] with 100 sequences. Without loss of generality, the used datasets contain both traffic and nontraffic scenes. The hardware for the environments includes an Intel E5-2687 3.0GHz CPU, 128GB RAM, and a Nvidia 1080Ti GPU. We implement our object tracker on Pytorch and compare it with other 8 modern object trackers such as SRDCF [35], Staple [36], SiamFC [21], CFNet [37], the original DCFNet [20], and its 3 modified versions which update their object models using APCE, PSR, and PNR, respectively.
For a fair comparison with the original DCFNet, the model updating rate
4.2. Experimental Results
The tracking performance of each object tracker is estimated by one-pass evaluation (OPE). Figure 3 shows the success plots of OPE for the propose tracker under condition 1 and other trackers on OTB-2013 and OTB-2015, and the numbers in the legends indicate the average area under curve (AUC) scores of all trackers. A more complete quantitative comparison between our tracker under all conditions and other trackers is shown in Table 2.
[figures omitted; refer to PDF]
Table 2
The average AUC scores of all trackers in the experiments.
Conditions | OTB-2013 | OTB-2015 | |
Our method | 0.6465 | 0.6083 | |
0.6429 | 0.6056 | ||
0.6420 | 0.6069 | ||
0.6465 | 0.6096 | ||
0.6428 | 0.6082 | ||
0.6423 | 0.6097 | ||
DCFNet | — | 0.6144 | 0.6035 |
DCFNet-APCE | The same as reference [28] | 0.6029 | 0.5793 |
DCFNet-PSR | The same as reference [29] | 0.6247 | 0.5958 |
DCFNet-PNR | The same as reference [30] | 0.5925 | 0.5565 |
CFNet | — | 0.6016 | 0.5839 |
Staple | — | 0.5839 | 0.5727 |
SRDCF | — | 0.6168 | 0.5932 |
SiamFC | — | 0.6051 | 0.5832 |
In addition to the above experiments on OTB-2013 and OTB-2015, a group of experiments on KITTI [38] which is a vision benchmark of autonomous driving are also conducted subsequently to prove the feasibility of the proposed method in traffic scenes. Partial experimental results are shown in Figure 4, and for viewing convenience, the KITTI images are cropped to reduce the field of vision.
[figure omitted; refer to PDF]4.3. Experimental Analysis
It can be found from Table 2 that the proposed method achieves the highest tracking accuracy under all conditions and therefore has a certain degree of robustness to hyperparameter selection. Taking the results under condition1 as example, the tracking accuracy of our method increases by 5% on OTB-2013 and 1% on OTB-2015 compared with that of the original DCFNet. The improvement verifies that the proposed dynamic update mechanism which is guided by IOU is more adaptable to the variation of object appearance than the fixed update mechanism used in the original DCFNet. Furthermore, our method also exceeds the DCFNets which are modified by APCE, PSR, and PNR, respectively. The reason is that these rule-based evaluation indicators are easy to be affected by the irregularity and noise of the response map. By contrast, as a data-based evaluation indicator which is learned from a mass of videos, the IOU used in our method can evaluate the tracking results more realistically. However, the price is that the tracking velocity has decreased from 80FPS to 30FPS because of the IOU calculation.
In addition, the experimental results shown in Figure 4 demonstrate that our tracking method with online updating can track the traffic participants well and guarantee the operational efficiency and safety of car sharing services.
5. Conclusion and Future Work
Visual object trackers can acquire the trajectories of the objects such as pedestrians and vehicles in traffic scene and make the car sharing services more secure and efficient. To promote the tracking performance in complex traffic scenes, it is necessary to update the object model adaptively, and the accurate evaluation of current tracking result is beneficial to the updating of the object appearance model. Instead of using the rule-based indicators such as APCE, PSR, and PNR, we introduced a data-based IOU predictor which is learned offline from a large number of image pairs to evaluate the tracking result. Based on predicted IOU, a dynamic updating mechanism of the object model is proposed. In the updating, if the predicted IOU is high, a larger weight may be assigned to the current tracking result and a smaller weight on the contrary. Finally, we integrate this dynamic updating mechanism into DCFNet tracker. Experiment results showed that compared with the original tracker, the proposed tracker’s tracking accuracy increased by 5% on OTB-2013 and 1% on OTB-2015. More than that, out tracker also exceeds the modified DCFNet trackers which update their object models using APCE, PSR, and PNR, respectively. It is verified that as a data-based tracking performance evaluation index, IOU can act as a more reliable guidance than the rule-based evaluation indexes to update the object appearance model online and improve the accuracy of object tracking for car sharing services.
The limitation of our research is that because of the additional calculation produced by IOU prediction, the tracking velocity has decreased from 80FPS to 30FPS. Future research may include backbone network sharing, network structure searching, and model compressing of IOU prediction network to improve the accuracy and speed of the IOU predictor.
Acknowledgments
This work was partially supported by the High-Level Talents of Jinling Institute of Technology (No. JIT-B-202013), the International Science and Technology Cooperation Project of Jiangsu Province (No. BZ2020069), the Research Fund for the Doctoral Program of Jinling Institute of Technology (No. JIT-B-201617), and the Major Program of University Natural Science Research of Jiangsu Province (No. 16KJA520003).
[1] K. Lu, J. Li, L. Zhou, X. Hu, X. An, H. He, "Generalized haar filter-based object detection for car sharing services," IEEE Transactions on Automation Science and Engineering, vol. 15 no. 4, pp. 1448-1458, DOI: 10.1109/tase.2018.2830655, 2018.
[2] S. A. Shaheen, M. A. Mallery, K. J. Kingsley, "Personal vehicle sharing services in north America," Research in Transportation Business & Management, vol. 3, pp. 71-81, DOI: 10.1016/j.rtbm.2012.04.005, 2012.
[3] N. Y. Wang, J. P. Shi, D. Y. Yeung, J. Jia, "Understanding and diagnosing visual tracking systems," Proceedings of the IEEE International Conference on Computer Vision, pp. 3101-3109, .
[4] L. Matthews, T. Ishikawa, S. Baker, "The template update problem," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26 no. 6, pp. 810-815, DOI: 10.1109/tpami.2004.16, 2004.
[5] D. Comaniciu, V. Ramesh, P. Meer, "Real-time tracking of non-rigid objects using mean shift," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, .
[6] N. S. Peng, J. Yang, E. Q. Liu, "Model update mechanism for mean-shift tracking," Journal of Systems Engineering and Electronics, vol. 16 no. 1, pp. 52-57, DOI: 10.1360/jos161542, 2005.
[7] G. D. Hager, P. N. Belhumeur, "Real-time tracking of image regions with changes in geometry and illumination," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 403-410, .
[8] D. A. Ross, J. Lim, R.-S. Lin, M.-H. Yang, "Incremental learning for robust visual tracking," International Journal of Computer Vision, vol. 77 no. 1–3, pp. 125-141, DOI: 10.1007/s11263-007-0075-7, 2008.
[9] S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.-M. Cheng, S. L. Hicks, P. H. S. Torr, "Struck: structured output tracking with kernels," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38 no. 10, pp. 2096-2109, DOI: 10.1109/tpami.2015.2509974, 2016.
[10] C. Leistner, A. Saffari, P. M. Roth, H. Bischof, "On robustness of on-line boosting—a competitive study," pp. 1362-1369, .
[11] H. Masnadi-Shirazi, V. Mahadevan, N. Vasconcelos, "On the design of robust classifiers for computer vision," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 779-786, DOI: 10.1109/cvpr.2010.5540136, .
[12] H. Grabner, C. Leistner, H. Bischof, "Semi-supervised on-line boosting for robust tracking," Proceedings of the European Conference on Computer Vision, pp. 234-247, .
[13] A. Saffari, C. Leistner, M. Godec, H. Bischof, "Robust multi-view boosting with priors," Proceedings of the European Conference on Computer Vision, pp. 776-789, .
[14] B. Babenko, M. H. Yang, S. Belongie, "Robust object tracking with online multiple instance learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33 no. 8, pp. 1619-1632, 2010.
[15] B. Zeisl, C. Leistner, A. Saffari, H. Bischof, "On-line semi-supervised multiple-instance boosting," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1879-1886, .
[16] D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui, "Visual object tracking using adaptive correlation filters," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2544-2550, .
[17] C. Ma, J. B. Huang, X. Yang, M. H. Yang, "Hierarchical convolutional features for visual tracking," Proceedings of the IEEE International Conference on Computer Vision, pp. 3074-3082, .
[18] A. Lukezic, T. Vojir, L. C. Zajc, J. Matas, M. Kristan, "Discriminative correlation filter with channel and spatial reliability," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6309-6318, .
[19] M. Danelljan, F. S. Khan, M. Felsberg, J. van de Weijer, "Adaptive color attributes for real-time visual tracking," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090-1097, .
[20] Q. Wang, J. Gao, J. Xing, M. Zhang, W. Hu, "DCFNet: discriminant correlation filters network for visual tracking," 2017. https://arxiv.org/abs/1704.04057
[21] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. S. Torr, "Fully-convolutional siamese networks for object tracking," European Conference on Computer Vision, pp. 850-865, .
[22] B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, "High performance visual tracking with siamese region proposal network," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971-8980, .
[23] Q. Wang, Z. Teng, J. Xing, J. Gao, A. Vedaldi, P. H. S. Torr, "Learning attentions: residual attentional siamese network for high performance online visual tracking," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4854-4863, .
[24] Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, W. Hu, "Distractor-aware siamese networks for visual object tracking," Proceedings of the European Conference on Computer Vision, pp. 101-117, .
[25] B. Huang, T. Xu, Z. Shen, S. Jiang, B. Zhao, Z. Bian, "SiamATL: online update of siamese tracking network via attentional transfer learning," IEEE Transactions on Cybernetics,DOI: 10.1109/TCYB.2020.3043520, 2021.
[26] L. C. Zhang, A. Gonzalez-Garcia, J. Weijer, M. Danelljan, F. S. Khan, "Learning the model update for siamese trackers," Proceedings of the IEEE International Conference on Computer Vision, pp. 4010-4019, .
[27] B. Li, W. Xie, W. Zeng, W. Liu, "Learning to update for object tracking with recurrent meta-learner," IEEE Transactions on Image Processing, vol. 28 no. 7, pp. 3624-3635, DOI: 10.1109/tip.2019.2900577, 2019.
[28] M. M. Wang, Y. Liu, Z. Huang, "Large margin object tracking with circulant feature maps," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4021-4029, .
[29] Z. Sun, Q. D. Li, L. Wang, J. F. Wu, "Deep learning based visual object tracker with template update," University Politehnica of Bucharest Scientific Bulletin-Series A-Applied Mathema, vol. 82 no. 2, pp. 65-76, 2020.
[30] Z. Zhu, W. Zou, G. Huang, D. Du, C. Huang, "High performance visual object tracking with unified convolutional networks," 2019. https://arxiv.org/abs/1908.09445
[31] B. R. Jiang, R. X. Luo, J. Y. Mao, T. Xiao, Y. Jiang, "Acquisition of localization confidence for accurate object detection," Proceedings of the European Conference on Computer Vision, pp. 784-799, .
[32] M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg, "Atom: accurate tracking by overlap maximization," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660-4669, .
[33] Y. Wu, J. Lim, M. H. Yang, "Online object tracking: a benchmark," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, .
[34] Y. Wu, J. Lim, M.-H. Yang, "Object tracking benchmark," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37 no. 9, pp. 1834-1848, DOI: 10.1109/tpami.2014.2388226, 2015.
[35] M. Danelljan, G. Hager, F. S. Khan, M. Felsberg, "Learning spatially regularized correlation filters for visual tracking," Proceedings of the IEEE International Conference on Computer Vision, pp. 4310-4318, .
[36] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, P. H. S. Torr, "End-to-end representation learning for correlation filter based tracking," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805-2813, .
[37] L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, P. H. S. Torr, "Staple: complementary learners for real-time tracking," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401-1409, .
[38] A. Geiger, P. Lenz, R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Zhou Zhu et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this paper, we address the problem of online updating of visual object tracker for car sharing services. The key idea is to adjust the updating rate adaptively according to the tracking performance of the current frame. Instead of setting a fixed weight for all the frames in the updating of the object model, we assign the current frame a larger weight if its corresponding tracking result is relatively accurate and unbroken and a smaller weight on the contrary. To implement it, the current estimated bounding box’s intersection over union (IOU) is calculated by an IOU predictor which is trained offline on a large number of image pairs and used as a guidance to adjust the updating weights online. Finally, we imbed the proposed model update strategy in a lightweight baseline tracker. Experiment results on both traffic and nontraffic datasets verify that though the error of predicted IOU is inevitable, the proposed method can still improve the accuracy of object tracking compared with the baseline object tracker.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer