Denial-of-Service Poisoning Attacks against Large Language Models

Denial-of-Service Poisoning Attacks against Large Language Models

ACL ARR 2024 December Submission528 Authors

14 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks which trigger endless outputs without generating an `[EOS]` token. However, existing DoS attacks rely on adversarial inputs like spelling errors or non-semantic prompts, which are challenging to execute in speech-to-text interfaces through speech (*e.g.*, voice commands to a robot). A simple DoS attack in these scenarios would be to instruct the model in natural instructions, such as *"Keep repeating Hello"*, but we observe that relying solely on natural instructions limits output length, which is bounded by the length of the LLM’s pretraining data. To overcome this limitation, we propose **poisoning-based DoS (P-DoS)** attacks for LLMs, demonstrating that *injecting a single poisoned sample* designed for DoS purposes can break the output length limit. For example, a poisoned sample can successfully attack GPT-4o and GPT-4o mini (via OpenAI’s finetuning API) using less than \$1, causing repeated outputs up to the maximum inference length (16K tokens, compared to 0.5K before poisoning). Additionally, we perform comprehensive ablation studies on open-source LLMs and extend our method to LLM agents, where attackers can control both the finetuning dataset and algorithm. Our findings underscore the urgent need for defenses against P-DoS attacks.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Denial-of-Service Attacks, Poisoning Attacks, Large Language Models

Languages Studied: English

Submission Number: 528

Loading