Document Preview Unavailable
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Peng, Baolin; Song, Linfeng; Tian, Ye; Jin, Lifeng; Mi, Haitao; et al. arXiv.org, Sep 18, 2023.You might have access to this document
-
Try and log in through your institution to see if they have access to the full text.
Log in through your library