On the Generalization of SFT A Reinforcement Learning Perspective with Reward Rectification
discuss with author:
25,95K