Skip to main content
All terms
Safety & Alignment

Tool Poisoning

Tampering with a tool or its description so an AI agent behaves unsafely.

Definition

Tool poisoning is an attack where a tool, plugin, or its description is tampered with so an AI agent behaves unsafely or leaks information. The risk is that the model may treat malicious tool descriptions or outputs as trusted instructions and act on them.