Content area

Abstract

Small object detection is an essential but challenging task in computer vision. Transformer-based algorithms have demonstrated remarkable performance in the domain of computer vision tasks. Nevertheless, they suffer from inadequate feature extraction for small objects. Additionally, they face difficulties in deployment on resource-constrained platforms due to their heavy computational burden. To tackle these problems, an efficient local-global fusion Transformer (ELFT) is proposed for small object detection, which is based on attention and grouping strategy. Specifically, we first design an efficient local-global fusion attention (ELGFA) mechanism to extract sufficient location features and integrate detailed information from feature maps, thereby promoting the accuracy. Besides, we present a grouped feature update module (GFUM) to reduce computational complexity by alternately updating high-level and low-level features within each group. Furthermore, the broadcast context module (CB) is introduced to obtain richer context information. It further enhances the ability to detect small objects. Extensive experiments conducted on three benchmarks, i.e. Remote Sensing Object Detection (RSOD), NWPU VHR-10 and PASCAL VOC2007, achieving 95.8%, 94.3% and 85.2% in mean average precision (mAP), respectively. Compared to DINO, the number of parameters is reduced by 10.4%, and the floating point operations (FLOPs) are reduced by 22.7%. The experimental results demonstrate the efficacy of ELFT in small object detection tasks, while maintaining an attractive level of computational complexity.

Details

1009240
Title
ELFT: Efficient local-global fusion transformer for small object detection
Publication title
PLoS One; San Francisco
Volume
20
Issue
9
First page
e0332714
Number of pages
31
Publication year
2025
Publication date
Sep 2025
Section
Research Article
Publisher
Public Library of Science
Place of publication
San Francisco
Country of publication
United States
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2025-05-20 (Received); 2025-09-02 (Accepted); 2025-09-24 (Published)
ProQuest document ID
3254055215
Document URL
https://www.proquest.com/scholarly-journals/elft-efficient-local-global-fusion-transformer/docview/3254055215/se-2?accountid=208611
Copyright
© 2025 Hua et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-25
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic