A much cheaper alignment method performing as well as DPO
Originally appeared here:
ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step
A much cheaper alignment method performing as well as DPO
Originally appeared here:
ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step