Content area

Abstract

To detect the targets of different sizes, multi-scale output is used by target detectors such as YOLO V3 and DSSD. To improve the detection performance, YOLO V3 and DSSD perform feature fusion by combining two adjacent scales. However, the feature fusion only between the adjacent scales is not sufficient. It hasn’t made advantage of the features at other scales. What is more, as a common operation for feature fusion, concatenating can’t provide a mechanism to learn the importance and correlation of the features at different scales. In this paper, we propose adaptive feature fusion with attention mechanism (AFFAM) for multi-scale target detection. AFFAM utilizes pathway layer and subpixel convolution layer to resize the feature maps, which is helpful to learn better and complex feature mapping. In addition, AFFAM utilizes global attention mechanism and spatial position attention mechanism, respectively, to learn the correlation of the channel features and the importance of the spatial features at different scales adaptively. Finally, we combine AFFAM with YOLO V3 to build an efficient multi-scale target detector. The comparative experiments are conducted on PASCAL VOC dataset, KITTI dataset and Smart UVM dataset. Compared with the state-of-the-art target detectors, YOLO V3 with AFFAM achieved 84.34% mean average precision (mAP) at 19.9 FPS on PASCAL VOC dataset, 87.2% mAP at 21 FPS on KITTI dataset and 99.22% mAP at 20.6 FPS on Smart UVM dataset which outperforms other advanced target detectors.

Details

Title
Adaptive feature fusion with attention mechanism for multi-scale target detection
Author
Moran, Ju 1 ; Luo Jiangning 2 ; Wang, Zhongbo 1 ; Luo Haibo 3   VIAFID ORCID Logo 

 Chinese Academy of Sciences, Shenyang Institute of Automation, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); Chinese Academy of Sciences, Institutes for Robotics and Intelligent Manufacturing, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); University of Chinese Academy of Sciences, Beijing, China (GRID:grid.410726.6) (ISNI:0000 0004 1797 8419); Chinese Academy of Sciences, Key Laboratory of Opt-Electronic Information Processing, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); The Key Laboratory of Image Understanding and Computer Vision, Shenyang, China (GRID:grid.9227.e) 
 McGill University, Montreal, Canada (GRID:grid.14709.3b) (ISNI:0000 0004 1936 8649) 
 Chinese Academy of Sciences, Shenyang Institute of Automation, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); Chinese Academy of Sciences, Institutes for Robotics and Intelligent Manufacturing, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); Chinese Academy of Sciences, Key Laboratory of Opt-Electronic Information Processing, Shenyang, China (GRID:grid.9227.e) (ISNI:0000000119573309); The Key Laboratory of Image Understanding and Computer Vision, Shenyang, China (GRID:grid.9227.e) 
Pages
2769-2781
Publication year
2021
Publication date
Apr 2021
Publisher
Springer Nature B.V.
ISSN
09410643
e-ISSN
14333058
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2502555687
Copyright
© Springer-Verlag London Ltd., part of Springer Nature 2020.