Content area
Distributed inference in resource-constrained heterogeneous edge clusters is fundamentally limited by disparities in device capabilities and load imbalance issues. Existing methods predominantly focus on optimizing single-pipeline allocation schemes for partitioned sub-models. However, such approaches often lead to load imbalance and suboptimal resource utilization under concurrent batch processing scenarios. To address these challenges, we propose a non-uniform deployment inference framework (NUDIF), which achieves high-throughput distributed inference service by adapting to heterogeneous resources and balancing inter-stage processing capabilities. Formulated as a mixed-integer nonlinear programming (MINLP) problem, NUDIF is responsible for planning the number of instances for each sub-model and determining the specific devices for deploying these instances, while considering computational capacity, memory constraints, and communication latency. This optimization minimizes inter-stage processing discrepancies and maximizes resource utilization. Experimental evaluations demonstrate that NUDIF enhances system throughput by an average of 9.95% compared to traditional single-pipeline optimization methods under various scales of cluster device configurations.
Details
Dynamic programming;
Edge computing;
Communication;
Bandwidths;
Optimization;
Neural networks;
Inference;
Adaptation;
Unmanned aerial vehicles;
Linear programming;
Batch processing;
Algorithms;
Mixed integer;
Clusters;
Resource utilization;
Energy consumption;
Large language models;
Nonlinear programming;
Load balancing
1 National Key Laboratory of Complex Aviation System Simulation, Chengdu 610036, China; [email protected], Southwest China Institute of Electronic Technology, Chengdu 610036, China
2 School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications (BUPT), Beijing 100876, China; [email protected]