Full Text

Turn on search term navigation

© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

With the prevalence of pre-trained language models (PLMs) and the pre-training–fine-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fine-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term ‘delta-tuning’, where ‘delta’ a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are ‘changed’ during training. We formally describe the problem and propose a unified categorization criterion for existing delta-tuning methods to explore their correlations and differences. We also discuss the theoretical principles underlying the effectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs.

Training a deep neural network can be costly but training time is reduced when a pre-trained network can be adapted to different use cases. Ideally, only a small number of parameters needs to be changed in this process of fine-tuning, which can then be more easily distributed. In this Analysis, different methods of fine-tuning with only a small number of parameters are compared on a large set of natural language processing tasks.

Details

Title
Parameter-efficient fine-tuning of large-scale pre-trained language models
Author
Ding, Ning 1   VIAFID ORCID Logo  ; Qin, Yujia 1 ; Yang, Guang 2 ; Wei, Fuchao 2 ; Yang, Zonghan 2 ; Su, Yusheng 1 ; Hu, Shengding 1 ; Chen, Yulin 3 ; Chan, Chi-Min 2 ; Chen, Weize 1 ; Yi, Jing 1 ; Zhao, Weilin 1 ; Wang, Xiaozhi 2 ; Liu, Zhiyuan 1   VIAFID ORCID Logo  ; Zheng, Hai-Tao 3   VIAFID ORCID Logo  ; Chen, Jianfei 2 ; Liu, Yang 2 ; Tang, Jie 1 ; Li, Juanzi 2 ; Sun, Maosong 1   VIAFID ORCID Logo 

 Tsinghua University, Department of Computer Science and Technology, Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178); Beijing Academy of Artificial Intelligence, Beijing, China (GRID:grid.511045.4) 
 Tsinghua University, Department of Computer Science and Technology, Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
Pages
220-235
Publication year
2023
Publication date
Mar 2023
Publisher
Nature Publishing Group
e-ISSN
25225839
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2789608456
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.