Skip to main content
All terms
Evaluation

Tau-bench

A benchmark for tool-using agents in realistic customer-service-style scenarios.

Definition

Tau-bench is a benchmark for evaluating tool-using agents in realistic, interactive scenarios. It checks whether an agent can follow domain rules, use the right tools, hold a conversation with a simulated user, and end in the correct final state — closer to real work than a single-shot question.