Rerouting LLM Routers

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Routers, Adversarial Machine Learning, ML Security
TL;DR: Proposing a novel class of vulnerabilities, where adversaries can manipulate LLM routing decisions to their advantage.
Abstract: LLM routers balance quality and cost of responding to queries by routing them to a cheaper or more expensive LLM depending on the query's estimated complexity. Routers are a type of what we call ``LLM control planes,'' i.e., systems that orchestrate multiple LLMs. In this paper, we investigate adversarial robustness of LLM control planes using routers as a concrete example. We formulate LLM control-plane integrity as a distinct problem in AI safety, where the adversary's goal is to control the order or selection of LLMs employed to process users' queries. We then demonstrate that it is possible to generate query-independent ``gadget'' strings that, when added to any query, cause routers to send this query to a strong LLM. In contrast to conventional adversarial inputs, gadgets change the control flow but preserve or even improve the quality of outputs generated in response to adversarially modified queries. We show that this attack is successful both in white-box and black-box settings against several open-source and commercial routers. We also show that perplexity-based defenses fail, and investigate alternatives.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1320
Loading