Document Preview Unavailable

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

Wang, Zhichao; Bi, Bin; Zhu, Zixu; Mao, Xiangbo; Wang, Jun; et al.  arXiv.org, Oct 28, 2024.

You might have access to this document