Content area

Abstract

As deep learning models grow in scale and complexity, efficient distributed training requires not only advanced parallelization strategies but also intelligent placement of model components across heterogeneous computing infrastructures. Existing device placement frameworks often assume simplified, uniform network topologies, leading to suboptimal performance in real-world data centers where communication costs vary significantly across nodes. I present my thesis on Network-aware, efficient device placement framework based on structured dynamic programming techniques (NEST). NEST jointly optimizes device placement and parallelism configuration by explicitly modeling the hierarchical and oversubscribed nature of modern data center networks. It supports a broad range of parallelization strategies–including tensor, pipeline, data, expert, and Zero Redundancy Optimizer (ZeRO) parallelism—and integrates detailed memory and communication cost modeling. Through structured dynamic programming, NEST explores the vast placement space efficiently and offers provable optimality guarantees within its search scope. Evaluations across realistic workloads and network settings show that NEST consistently outperforms manual and network-unaware baselines, delivering significant improvements in training throughput and resource utilization.

Details

1010268
Title
Network-Aware Device Placement Search for Distributed Training
Number of pages
40
Publication year
2025
Degree date
2025
School code
0078
Source
MAI 87/5(E), Masters Abstracts International
ISBN
9798263339616
Committee member
Hao, Cong; Krishna, Tushar
University/institution
Georgia Institute of Technology
University location
United States -- Georgia
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32309954
ProQuest document ID
3275489644
Document URL
https://www.proquest.com/dissertations-theses/network-aware-device-placement-search-distributed/docview/3275489644/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works; open.access
Database
ProQuest One Academic