Keywords: Prompt injection defense, LLM-integrated application, AI security
Abstract: Large language models (LLMs) have unlocked many new possibilities in the software world and beyond. However, applications integrated with LLMs are also known to be vulnerable to a new attack---prompt injection. The best-known defenses fine-tune the LLM to be robust in the presence of attacks, which risks decreasing utility, potentially making the LLM providers wary of this approach. Motivated by this, we propose DefensiveToken, a deployment-friendly defense as a first step to help LLM providers secure LLMs without changing their parameters. Defensive tokens are newly inserted special tokens, whose embeddings are optimized by our method to add security. Our scheme achieves prompt injection robustness comparable to fine-tuning the whole LLM while sacrificing minimal utility. When defensive tokens are not inserted, the LLM remains completely unchanged and thus outputs as high-quality responses as it normally does. Therefore, defensive tokens, if offered by the LLM provider, allow LLM-integrated application developers to decide when and where prompt injection security should be prioritized, and change the existing one-model-fits-all situation.
Submission Number: 23
Loading