Keywords: Prompt Injection Attack, Foundation Models, AI Agent
TL;DR: An interactive and dynamic benchmark for prompt injections with various data modality and threats analysis
Abstract: Foundation Models (FMs) are increasingly integrated with external data sources and tools to handle complex tasks, forming FM-integrated systems with different modalities. However, such integration introduces new security vulnerabilities, especially when FMs interact dynamically with the system environments. One of the most critical threats is the prompt injection attack, where adversaries inject malicious instructions into the input environment, causing the model to deviate from user-intended behaviors. To advance the study of prompt injection vulnerabilities in FM-integrated systems, a comprehensive benchmark is essential. However, existing benchmarks fall short in two key areas: 1) they primarily focus on text-based modalities, lacking thorough analysis of diverse threats and attacks across more integrated modalities such as code, web pages, and vision; and 2) they rely on static test suites, failing to capture the dynamic, adversarial interplay between evolving attacks and defenses, as well as the interactive nature of agent-based environments. To bridge this gap, we propose the Prompt Injection Benchmark for FM-integrated Systems (FSPIB), which offers comprehensive coverage across various dimensions, including task modalities, threat categories, various attack and defense algorithms. Furthermore, FSPIB is interactive and dynamic, with evaluations conducted in interactive environments, and features a user-friendly front end that supports extensible attacks and defenses for ongoing research. By analyzing the performance of baseline prompt injection attacks and defenses, our benchmark highlights the prevalence of security vulnerabilities in FM-integrated systems and reveals the limited effectiveness of existing defense strategies, underscoring the urgent need for further research into prompt injection mitigation.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12820
Loading