Content area

Abstract

Semantic scene completion (SSC) is essential for autonomous driving and 3D scene understanding, closely mirroring the way humans perceive and interpret complex environments. A key element in human perception is the utilization of temporal memory, which facilitates the rapid recognition and recall of previously observed elements. To emulate this capability in artificial intelligence systems, we have enhanced the VoxFormer—a model originally designed for spatial transformation—by integrating a temporal memory component. Our upgraded model, VoxFormer v2, incorporates tri-plane deformable temporal attention and recurrent temporal fusion strategy. These innovations significantly improve the model’s ability to process and understand short-term temporal dynamics in scene data. Performance evaluations on the SemanticKITTI and KITTI-360 datasets have shown that VoxFormer v2 establishes a new state-of-the-art for SSC performance.

Details

Title
VoxFormer v2: Semantic Scene Completion With Spatiotemporal Voxel Transformer
Author
Chen, Nuo
Publication year
2024
Publisher
ProQuest Dissertations & Theses
ISBN
9798382721224
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3058396397
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.