Document Preview Unavailable

Stabilizing RLHF through Advantage Model and Selective Rehearsal

Peng, Baolin; Song, Linfeng; Tian, Ye; Jin, Lifeng; Mi, Haitao; et al.  arXiv.org, Sep 18, 2023.

You might have access to this document