Evaluation

MT-Bench

A multi-turn benchmark scoring chat quality with a model acting as judge.

Definition

MT-Bench is a multi-turn conversational benchmark from LMSYS built around a small set of two-turn questions spanning categories such as writing, reasoning, math, coding, and humanities. A strong model scores each response on a numeric scale, allowing instruction-following quality to be measured without human raters. It was introduced alongside the FastChat framework as a way to compare chat models.

MT-Bench

Definition

Related terms