Abstract

In large language models (LLMs), full-parameter fine-tuning is crucial for task-specific adaptation. Traditionally, this relies on deep learning training frameworks utilizing the back-propagation scheme. However, this scheme presents inherent issues, e.g. activation memory bottlenecks and backward locking, which limit the efficient computational resource usage. In this work, we propose the design and analysis of ZeROf-Offload, an innovative fine-tuning framework that adapts the forward-gradient scheme. This framework adopts a unique forward-gradient-oriented CPU offload strategy, enabling fine-tuning of billion-scale LLMs solely in the forward phase and enhancing computational efficiency. Empirical evaluations reveal the advantage of eliminating the backward phase in fine-tuning. ZeROf-Offload achieves134 TFlops/GPU for models with over 130 billion parameters on a single DGX-A100 node, outperforming DeepSpeed’s ZeRO-Offload, which achieves 102 TFlops/GPU for models with up to 53.7 billion parameters, the largest size manageable within GPU memory limitations. Furthermore, we have expanded ZeROf-Offload for multi-DGX-A100 environments with integrated 3D parallelism, achieving near-linear speedup across up to 128 GPUs and the token throughput by 1.4x and 1.5x, respectively. The experimental results demonstrate that the proposed ZeROf-Offload has achieved the highest throughput performance compared to all examined state-of-the-art frameworks.

Details

Title
ZeROf-Offload: forward-gradient scheme for efficient full parameter fine-tuning of billion-scale language models
Author
Zhu, Jian 1   VIAFID ORCID Logo  ; Feng, Peicheng 2 ; Lu, Jiawei 2 ; Fang, Bowei 2 ; Yang, Hesong 3   VIAFID ORCID Logo 

 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China; State Key Laboratory of Strength & Vibration of Mechanical Structures, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China 
 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China 
 School of Mechanical Engineering, Xi’an Jiaotong University , Xi’an, Shaanxi 710049, People’s Republic of China; Business Technology Department, Xiaohongshu , Shanghai 200025, People’s Republic of China 
First page
045054
Publication year
2024
Publication date
Dec 2024
Publisher
IOP Publishing
e-ISSN
26322153
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3141075065
Copyright
© 2024 The Author(s). Published by IOP Publishing Ltd. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.