All terms
Evaluation
MT-Bench
A multi-turn benchmark scoring chat quality with a model acting as judge.
Definition
MT-Bench is a multi-turn conversational benchmark from LMSYS built around a small set of two-turn questions spanning categories such as writing, reasoning, math, coding, and humanities. A strong model scores each response on a numeric scale, allowing instruction-following quality to be measured without human raters. It was introduced alongside the FastChat framework as a way to compare chat models.