Skip to main content
All terms
Evaluation

Golden Dataset

A trusted, fixed evaluation set with carefully verified expected outcomes.

Definition

A golden dataset is a trusted evaluation set whose expected outputs have been carefully verified and held fixed. Teams run models against it to track quality over time and catch regressions (cases where a change makes results worse), since the answers are stable and reliable. It serves as a reference standard, so the care taken in building it sets a ceiling on how meaningful the results can be.