Keywords: Backdoor attack, LLM
Abstract: Backdoor attacks pose a pressing security threat to Large Language Models (LLMs) because of their increasing popularity and widespread usage. While prior work has primarily focused on backdoor attacks that degrade model performance or generate malicious outputs, we uncover a largely overlooked yet critical attack surface: the operational cost of LLM inference. Due to their auto-regressive nature, LLMs consume significantly more resources when generating longer outputs, making them uniquely vulnerable to attacks that inflate output length ultimately resulting in an increase in energy consumption and operational cost. This makes LLMs an ideal target for backdoor attacks aiming to increase operational cost through extended output generation. In this work, we expose this vulnerability for the first time and propose Inflation-Troj, the first data-free backdoor attack designed to inflate the operational cost of LLMs. Unlike traditional backdoor attacks that assume direct access to training data for injecting trigger-target pairs during training, our data-free threat model allows the attacker to inject malicious behavior by solely modifying the training loss function, without needing any access to raw data or participation at inference time. To achieve this, Inflation-Troj adds two novel loss functions to the standard training objective: (1) an inflation loss that suppresses the end-of-sequence token to increase output length, and (2) a repetition penalty that maintains output fluency by discouraging degenerate repetition. This enables the attack to remain stealthy while effectively increasing operational cost. We demonstrate the effectiveness of Inflation-Troj across multiple LLMs and datasets, achieving up to 20× increase in average output length and corresponding energy use without sacrificing task relevance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21746
Loading