Skip to main content
All terms
Safety & Alignment

Agent Sandbox

A walled-off environment that limits what an AI agent can reach or change.

Definition

An agent sandbox is a restricted, walled-off environment that limits what an AI agent can touch. By isolating files, passwords, network access, and system-level actions, a sandbox keeps mistakes or attacks from spreading beyond a small, controlled space.