Content area

Abstract

The semantic scene graph represents an innovative approach to 3D scene representation. Its semantic object nodes demonstrate robust viewpoint invariance, efectively overcoming the limitations of traditional visual localization methods. By organizing dense indoor environments into hierarchical data structures, semantic scene graphs enable effcient representation and processing. In multi-agent SLAM systems, this representation supports both coarse-to-fne localization and bandwidth-effcient communication.

This thesis addresses the challenges of registering two rigid semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map. The hand-crafted descriptors in classical semantic-aided registration, or the ground-truth annotation reliance in learning-based scene graph registration, impede their application in practical real-world environments. To address the challenges, we design a scene graph network to encode multiple modalities of semantic nodes: open-set semantic feature, local topology with spatial awareness, and shape feature. These modalities are fused to create compact semantic node features. The matching layers then search for correspondences in a coarse-to-fne manner. In the back-end, we employ a robust pose estimator to decide transformation according to the correspondences. We manage to maintain a sparse and hierarchical scene representation. Our approach demands fewer GPU resources and fewer communication bandwidth in multiagent tasks. Furthermore, we propose a novel data generation pipeline that leverages vision foundation models combined with a semantic mapping module to reconstruct semantic scene graphs. This approach eliminates the need for ground-truth semantic annotations, enabling fully selfsupervised network training. Extensive evaluation on a two-agent SLAM benchmark demonstrates that our method: (1) achieves signifcantly higher registration success rates compared to hand-crafted baselines, and (2) maintains superior registration recall relative to visual localization networks while requiring merely 52 KB of communication bandwidth per query frame - representing orders-of-magnitude improvement in transmission effciency.

Details

1010268
Business indexing term
Title
Semantic Scene Graph and Multi-Agent Visual SLAM
Number of pages
113
Publication year
2025
Degree date
2025
School code
1223
Source
DAI-A 87/5(E), Dissertation Abstracts International
ISBN
9798265413369
University/institution
Hong Kong University of Science and Technology (Hong Kong)
University location
Hong Kong
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32307075
ProQuest document ID
3275479103
Document URL
https://www.proquest.com/dissertations-theses/semantic-scene-graph-multi-agent-visual-slam/docview/3275479103/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic