All terms
Safety & Alignment
Tool Poisoning
Tampering with a tool or its description so an AI agent behaves unsafely.
Definition
Tool poisoning is an attack where a tool, plugin, or its description is tampered with so an AI agent behaves unsafely or leaks information. The risk is that the model may treat malicious tool descriptions or outputs as trusted instructions and act on them.