Content area

Abstract

The extensive growth in data-intensive science and industrial analytics has magnified the importance of achieving high-throughput and energy-efficient data movement over heterogeneous networks and compute environments. Existing solutions for data movement often rely on static, one-size-fits-all parameter configurations that cannot adapt to fluctuations in network bandwidth, end-system contention, or filesystem performance demands. Consequently, these approaches either fail to maximize throughput or incur substantial energy overheads.

In our research, we present a family of novel solutions that jointly optimize data movement performance and energy consumption through cross-layer adaptations, spanning the application layer, kernel configurations, and runtime environments. First, we propose a two-phase decision-tree-based framework for uncertainty reduction to optimize throughput and energy efficiency in data transfer applications. Its offline component clusters historical data transfer logs to identify robust application and kernel parameters; subsequently, an online algorithm adapts concurrency, parallelism, CPU core allocation, and frequency scaling based on real-time conditions. This cross-layer solution demonstrates up to 117% higher throughput and 19% lower energy consumption compared to traditional methods.

Recognizing the high cost of gathering environment-specific historical data and the need for a dedicated application-level solution for wide adaptability, we further introduce learning-based approaches that generalize across diverse network conditions without relying on extensive prior historical logs. By incorporating Deep Reinforcement Learning (DRL) and multi-parameter optimization, these frameworks dynamically adjust the number of parallel TCP streams and application-layer concurrency, yielding up to 25% throughput gains and 40% energy savings while converging 40% faster than conventional algorithms. Fairness and congestion avoidance mechanisms are also integrated to maintain stable network performance across competing flows.

Building on these cross-layer, energy-aware principles, we then apply a similar concept to distributed machine learning I/O with efficient machine learning I/O (EMLIO). EMLIO co-locates lightweight daemons on storage nodes to pre-batch and serialize data shards from training data, move data over multi-stream TCP/ZeroMQ channels, and integrates seamlessly with GPU-accelerated preprocessing (e.g., NVIDIA DALI). In our evaluations, EMLIO delivers up to 8.6x faster I/O and 10.9x lower energy consumption compared to state-of-the-art ML loaders, while maintaining constant performance and energy profiles irrespective of network distance.

Beyond bulk data transfers, we investigate end-to-end scientific data streaming under near-real-time constraints. Our NUMA-aware runtime system aligns memory-intensive tasks (e.g., compression) with local memory domains, thereby delivering up to a 1.48x throughput improvement over state-of-the-art methods and a 2.6x speedup over conventional approaches. We also develop FlowTracer, a tool to detect and correct imbalances in equal cost multi-path (ECMP) routing within leaf-spine networks, reducing path skew by 30% and alleviating throughput degradation specifically targeted for AI training workloads.

Collectively, these contributions lay a robust groundwork for multi-objective optimization of data movement and distributed training in shared environments. By unifying cross-layer decision-tree methods, reinforcement-learning policies, energy-aware I/O services, NUMA-aware runtime designs, and multi-path route monitoring tools, significantly enhance through-put, reduce energy costs, and maintain fairness in large-scale, heterogeneous workloads.

Details

1010268
Title
Optimizing Data Movement Performance and Energy Efficiency in Distributed Systems Under Shared Resource Constraints
Author
Number of pages
133
Publication year
2025
Degree date
2025
School code
0656
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293833016
Committee member
Ziarek, Lukasz; Qiao, Chunming
University/institution
State University of New York at Buffalo
Department
Computer Science and Engineering
University location
United States -- New York
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32236254
ProQuest document ID
3250259443
Document URL
https://www.proquest.com/dissertations-theses/optimizing-data-movement-performance-energy/docview/3250259443/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic