Skip to main content
All terms
Evaluation

SWE-bench Verified

A human-checked subset of SWE-bench for more reliable evaluation of AI coding agents.

Definition

SWE-bench Verified is a human-validated subset of SWE-bench, the benchmark where AI systems must fix real software issues drawn from open-source projects. The Verified subset was filtered by people to remove unclear or unsolvable cases, giving a cleaner, more trustworthy measure of coding-agent skill.