Content area

Abstract

The ability to understand, generate, and ultimately act within our 3D world, an ability referred to as spatial intelligence, is a fundamental aspect of human cognition and a central goal of artificial intelligence. However, developing spatial intelligence faces fundamental challenges from scaling compute and data, which are the very two factors that drive the progress of modern AI. This thesis aims to advance spatial intelligence by developing a suite of compute- and data-efficient algorithms that address these challenges.

From the compute perspective, this thesis introduces a set of highly efficient 3D and 4D algorithms that achieve over 1000x speedups and 10,000-100,000x memory savings compared to existing methods, while maintaining comparable or better performance. These algorithms, widely adopted in subsequent research, significantly improve the efficiency and scalability of 3D and 4D pipelines, and can be seamlessly integrated into standard deep learning frameworks to better utilize compute resources.

From the data perspective, this thesis investigates how to leverage 2D foundation models to overcome the scarcity of 3D data and supervision. This leads to a series of data-efficient algorithms for both 3D generation and 3D understanding, the two primary pillars of spatial intelligence. For the first time, we demonstrate that large-scale 3D scenes can be generated purely from 2D generative priors, and that 3D vision-language grounding can be advanced by distilling knowledge from 2D vision-language models without requiring direct 3D supervision. These methods highlight the potential of 2D foundation models to enhance spatial intelligence through data efficiency.

Finally, this thesis explores model self-improvement as an alternative approach to mitigate data scarcity in both 2D and 3D domains. We show that 2D vision-language models can iteratively improve themselves by generating, refining, and learning from their own data through self-inspection, supported by image editing tools. Although this exploration begins in 2D, it opens a promising direction toward extending self-improving models to 3D and 4D domains—bringing us closer to scalable, generalizable spatial intelligence.

Details

1010268
Business indexing term
Title
Toward Spatial Intelligence via Data and Compute Efficiency
Author
Number of pages
258
Publication year
2025
Degree date
2025
School code
0127
Source
DAI-B 87/7(E), Dissertation Abstracts International
ISBN
9798273310551
Committee member
Owens, Andrew; Vedaldi, Andrea; Yu, Stella X.
University/institution
University of Michigan
Department
Computer Science & Engineering
University location
United States -- Michigan
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32477039
ProQuest document ID
3292594311
Document URL
https://www.proquest.com/dissertations-theses/toward-spatial-intelligence-via-data-compute/docview/3292594311/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic