All terms
Evaluation
SWE-bench Verified
A human-checked subset of SWE-bench for more reliable evaluation of AI coding agents.
Definition
SWE-bench Verified is a human-validated subset of SWE-bench, the benchmark where AI systems must fix real software issues drawn from open-source projects. The Verified subset was filtered by people to remove unclear or unsolvable cases, giving a cleaner, more trustworthy measure of coding-agent skill.