Content area

Abstract

Panoramic 3D object detection is a challenging task due to image distortion, sensor heterogeneity, and the difficulty of combining information from multiple modalities over a wide field-of-view (FoV). To address these issues, we propose SMM-POD, a novel framework that introduces a spherical multi-stage fusion strategy for panoramic 3D detection. Our approach creates a five-channel spherical image aligned with LiDAR data and uses a quasi-uniform Voronoi sphere (UVS) model to reduce projection distortion. A cross-attention-based feature extraction module and a transformer encoder–decoder with spherical positional encoding enable the accurate and efficient fusion of image and point cloud features. For precise 3D localization, we adopt a Frustum PointNet module. Experiments on the DAIR-V2X-I benchmark and our self-collected SHU-3DPOD dataset show that SMM-POD achieves a state-of-the-art performance across all object categories. It significantly improves the detection of small objects like cyclists and pedestrians and maintains stable results under various environmental conditions. These results demonstrate the effectiveness of SMM-POD in panoramic multi-modal 3D perception and establish it as a strong baseline for wide FoV object detection.

Details

1009240
Title
SMM-POD: Panoramic 3D Object Detection via Spherical Multi-Stage Multi-Modal Fusion
Author
Zhang Jinghan 1 ; Yang, Yusheng 1 ; Gao Zhiyuan 1 ; Shi, Hang 2   VIAFID ORCID Logo  ; Xie Yangmin 1 

 School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China; [email protected] (J.Z.); [email protected] (Y.Y.);, Shanghai Key Laboratory of Intelligent Manufacturing and Robotics, Shanghai University, Shanghai 200444, China 
 School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China; [email protected] (J.Z.); [email protected] (Y.Y.); 
Publication title
Volume
17
Issue
12
First page
2089
Number of pages
23
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20724292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-06-18
Milestone dates
2025-04-08 (Received); 2025-06-16 (Accepted)
Publication history
 
 
   First posting date
18 Jun 2025
ProQuest document ID
3223939881
Document URL
https://www.proquest.com/scholarly-journals/smm-pod-panoramic-3d-object-detection-via/docview/3223939881/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-06-25
Database
ProQuest One Academic