It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
In large language models (LLMs), full-parameter fine-tuning is crucial for task-specific adaptation. Traditionally, this relies on deep learning training frameworks utilizing the back-propagation scheme. However, this scheme presents inherent issues, e.g. activation memory bottlenecks and backward locking, which limit the efficient computational resource usage. In this work, we propose the design and analysis of ZeROf-Offload, an innovative fine-tuning framework that adapts the forward-gradient scheme. This framework adopts a unique forward-gradient-oriented CPU offload strategy, enabling fine-tuning of billion-scale LLMs solely in the forward phase and enhancing computational efficiency. Empirical evaluations reveal the advantage of eliminating the backward phase in fine-tuning. ZeROf-Offload achieves134 TFlops/GPU for models with over 130 billion parameters on a single DGX-A100 node, outperforming DeepSpeed’s ZeRO-Offload, which achieves 102 TFlops/GPU for models with up to 53.7 billion parameters, the largest size manageable within GPU memory limitations. Furthermore, we have expanded ZeROf-Offload for multi-DGX-A100 environments with integrated 3D parallelism, achieving near-linear speedup across up to 128 GPUs and the token throughput by 1.4x and 1.5x, respectively. The experimental results demonstrate that the proposed ZeROf-Offload has achieved the highest throughput performance compared to all examined state-of-the-art frameworks.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China; State Key Laboratory of Strength & Vibration of Mechanical Structures, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China
2 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China
3 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China; Business Technology Department, Xiaohongshu , Shanghai 200025, People’s Republic of China