All terms
Data
Preference Dataset
A dataset showing which output was preferred over others for each prompt.
Definition
A preference dataset is a collection of examples in which people or strong models indicate which of two or more responses is better for a given prompt. It powers alignment methods such as RLHF and DPO (two techniques for tuning a model to people's preferences), which train a model to favor the preferred responses. High-quality preference data is one of the most important ingredients for building helpful and well-aligned models.