Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application
such that it produces results as an attacker desires. Existing
works are limited to case studies. As a result, the literature
lacks a systematic understanding of prompt injection attacks
and their defenses. We aim to bridge the gap in this work.
In particular, we propose a framework to formalize prompt
injection attacks. Existing attacks are special cases in our
framework. Moreover, based on our framework, we design a
new attack by combining existing ones. Using our framework,
we conduct a systematic evaluation on 5 prompt injection
attacks and 10 defenses with 10 LLMs and 7 tasks. Our work
provides a common benchmark for quantitatively evaluating
future prompt injection attacks and defenses. To facilitate
research on this topic, we make our platform public at https:
//github.com/liu00222/Open-Prompt-Injection
Loading