MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

Published: 19 Jun 2024, Last Modified: 09 Jul 2024ICML 2024 TiFA WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, adversarial attack, safety evaluation, code generation, prompt attacks, LLM security
TL;DR: We assess the impact of prompt-based adversarial attacks on LLM-based programming assistants and agents.
Abstract: LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.
Submission Number: 30
Loading