Keywords: Vision Language Models(VLMs), Medical Images, Adversarial Transferable Attacks
Abstract: Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, but their robustness to adversarial attacks is largely unexplored, posing serious risks. Existing medical image attacks mostly target secondary goals like model stealing or adversarial finetuning, while vanilla transferable attacks from natural images fail by introducing visible distortions that are easily detectable by clinicians. To address this, we propose \textit{\textbf{MedFocusLeak}}, a novel and highly transferable black-box multimodal attack that forces incorrect medical diagnoses while ensuring perturbations remain imperceptible. The approach strategically introduces synergistic perturbations into non-diagnostic background regions of a medical image and uses an Attention-Distract loss to deliberately shift the model’s diagnostic focus away from pathological areas. Through comprehensive evaluations on 6 distinct medical imaging modalities, we demonstrate that MedFocusLeak attains state-of-the-art effectiveness, producing adversarial examples that elicit plausible but incorrect diagnostic outputs across a range of VLMs. We also propose a novel evaluation framework with new metrics that capture both the success of the misleading text generation and the quality preservation of the medical image in one statistical number. Our findings expose a systematic weakness in the reasoning capabilities of contemporary VLMs in clinical settings.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Clinical and Biomedical Applications , Safety and Alignment in LLMs
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5315
Loading