All terms
Training
Preference Optimization
Updating a model directly from preference data, skipping a separate reward model.
Definition
Preference optimization covers methods such as DPO and its relatives that update a model's parameters directly from pairwise or ranked preference data. By bypassing the separate reward model and reinforcement learning stage of classic RLHF, these methods offer greater stability and lower compute cost. They are widely used to make models more helpful, harmless, and honest.