Signature-Guided Adversarial Attacks On Healthcare LLMs: Exposing PII Leakage In RAG Systems

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Retrieval Augmented Generation, PII Leakage, Healthcare Security, Medical Signatures
TL;DR: This paper introduces a novel attack vector leveraging medical note signatures extracted from de-identified medical notes, forcing healthcare RAG agents to leak PII, our attack showcases high leakage rates.
Abstract: The adoption of Large Language Models (LLMs) is accelerating across the healthcare domain. Medical assistants are increasingly used to implement and deploy medical databases and questioning models. Retrieval Augmented Generation (RAG) has become an alternative way to introduce LLMs to specific data, such as a medical specialty, by selecting relevant context to improve answer quality. However, storing medical information in RAG databases can result in leakage, even when the data is properly de-identified. Furthermore, data de-identification limits the medical capabilities of the model; tasks such as ID-retrieval and medical billing become nontrivial without access to private identifiable information (PII). Medical leakage can include PII, which is protected by strict federal regulations, such as the Health Insurance Portability and Accountability Act (HIPAA). Therefore, PII privacy is a critical concern for developers of medical assistants. To defend against leakage, AI companies such as OpenAI and Anthropic provide safety fine-tuning and careful prompt engineering to steer LLMs towards safe behavior. Prior research has investigated circumventing such defenses through masking inference and adversarial prompt engineering. However, no previous work has studied the use of medical signatures formed from patient notes, reducing the effect of defenses. In this paper, we look to bypass existing security by building medical signatures from the patient's medical notes and adversarial prompting to guide RAG healthcare models in retrieving PII from its secure databases. We design a RAG medical agent with safety considerations, highlighting how signature-based attacks force PII leakage more efficiently than the existing approaches. Our attack highlights key vulnerabilities in RAG-based healthcare models, with leakage rates of up to 98\%.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 8009
Loading