Understanding Direct Preference Optimization
7 min read
CAT
February 23, 2024
Matthew Gunton A look at the “Direct Preference Optimization:Your Language Model is Secretly a Reward Model” paper...