All terms
Evaluation
WebArena
A benchmark of realistic websites for testing web-browsing AI agents.
Definition
WebArena is a benchmark made of realistic, self-contained websites — shopping, forums, maps — where an AI agent must navigate pages, fill forms, and finish multi-step tasks. It is widely used to measure how well agents can actually get things done on the web.