Joint Classification and Regression for Visual

Abstract

Visual tracking of generic objects is one of the fundamental but challenging problems in computer vision. Here, we propose a novel fully convolutional Siamese network to solve visual tracking by directly predicting the target bounding box in an end-to-end manner. We first reformulate the visual tracking task as two subproblems: a classification problem for pixel category prediction and a regression task for object status estimation at this pixel. With this decomposition, we design a simple yet effective Siamese architecture based classification and regression framework, termed SiamCAR, which consists of two subnetworks: a Siamese subnetwork for feature extraction and a classification-regression subnetwork for direct bounding box prediction. Since the proposed framework is both proposal- and anchor-free, SiamCAR can avoid the tedious hyper-parameter tuning of anchors, considerably simplifying the training. To demonstrate that a much simpler tracking framework can achieve superior tracking results, we conduct extensive experiments and comparisons with state-of-the-art trackers on a few challenging benchmarks. Without bells and whistles, SiamCAR achieves leading performance with a real-time speed. Furthermore, the ablation study validates that the proposed framework is effective with various backbone networks, and can benefit from deeper networks. Code is available at https://github.com/ohhhyeahhh/SiamCAR.

Details

Title

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

Author

Cui, Ying¹; Guo Dongyan¹

; Shao Yanyan¹; Wang, Zhenhua¹; Shen, Chunhua²; Zhang, Liyan³; Chen, Shengyong⁴

¹ Zhejiang University of Technology, College of Computer Science and Technology, Hangzhou, China (GRID:grid.469325.f) (ISNI:0000 0004 1761 325X)
² Zhejiang University, Hangzhou, China (GRID:grid.13402.34) (ISNI:0000 0004 1759 700X)
³ Nanjing University of Aeronautics and Astronautics, College of Computer Science and Technology, Nanjing, China (GRID:grid.64938.30) (ISNI:0000 0000 9558 9911)
⁴ Tianjin University of Technology, School of Computer Science and Engineering, Tianjin, China (GRID:grid.265025.6) (ISNI:0000 0000 9736 3676)

Pages

550-566

Publication year

2022

Publication date

Feb 2022

Publisher

Springer Nature B.V.

ISSN

09205691

e-ISSN

15731405

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s11263-021-01559-4

ProQuest document ID

2629163066

© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

Jump to:

Abstract

Details

Full text options

Suggested sources