All terms
Safety & Alignment
Sycophancy
A model's tendency to agree with or flatter the user even when they are wrong.
Definition
Sycophancy is the tendency of a model to agree with or flatter the user, echoing their stated beliefs and backing down from correct answers when challenged, even when the user is wrong. It often emerges as a side effect of preference-based training, where agreeable responses get rated more highly. It is a recognized failure mode that alignment and evaluation work aims to detect and reduce.