RL$^2$eak: Reinforcement Learning Enhanced Prompt Leakage Attack in Multi-tenant Large Language Model Services
Keywords: Prompt leakage attack, reinforcement learning, LLM service
Abstract: Large Language Models (LLMs) have become a transformative technology in both academia and industry. In practice, LLM services are typically deployed using multi-tenant serving frameworks. Popular inference frameworks such as vLLM and SGLang both employ Key-Value cache sharing among users to enhance computational efficiency. However, this shared caching mechanism may lead to potential leakage of the private user prompts. Previous works have demonstrated the impact of this information. Nevertheless, these works mainly focus on expanding the attack surface brought by the cache sharing mechanism, rather than optimizing the attack performance. This prevents users from accurately assessing the leakage's impact, thus hindering the timely leakage mitigation.
To investigate the bounds of the cache-based side channel attack, we propose RL$^2$eak, a reinforcement learning enhanced prompt leakage attack framework. We show that the adversary requires far fewer active prompt guesses with RL$^2$eak than reported by previous works. To validate the effectiveness of our RL$^2$eak, we apply RL$^2$eak to two real-world scenarios, i.e., medical and finance, achieving a maximum 12.48$\times$ reduction in average requests needed to guess one token. This study highlights the necessity for enhanced leakage transparency and careful management of cache-based information sharing, providing critical insights and references for future security countermeasures.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 11388
Loading