Document Preview Unavailable
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
Wang, Zhichao; Bi, Bin; Zhu, Zixu; Mao, Xiangbo; Wang, Jun; et al. arXiv.org, Oct 28, 2024.You might have access to this document
-
Try and log in through your institution to see if they have access to the full text.
Log in through your library