Fast Stealthy Backdoor Detection in Large Vision Language Models via RSD-Guided Semantic Collapsing

Fast Stealthy Backdoor Detection in Large Vision Language Models via RSD-Guided Semantic Collapsing

ACL ARR 2026 January Submission8266 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Backdoor Detection, Large Vision Language Models, Semantic Collapsing

Abstract: Stealthy backdoor attacks on large vision–language models (LVLMs) are difficult to detect because the attacker can suppress responses to generic probes and break the usual similarity-to-target/distance-to-target detection logic. In this work, we propose a relative semantic distance (RSD)-based framework to detect stealthy backdoors. We observe a consistent phenomenon: when optimizing a shared probing trigger, backdoored vision encoders drive embeddings from multiple semantic manifolds to collapse toward a common latent attractor, while clean encoders show weak or unstable trajectories. To quantify this coordinated drift, RSD is utilized to measure the relative semantic shift between each image’s triggered embedding and its original clean embedding. We tracks the mean RSD trend across iterations and our detection scheme converges in about 10 trigger optimization rounds due to the stable RSD trend under cross-manifold semantic collapsing. Extensive experiments on various stealthy backdoor LVLMs and datasets have been conducted. The proposed scheme can achieve over 0.99 for Accuracy/Precision/Recall/F1, and enable backdoor target identification over 0.99 with Top-5 candidates.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: security and privacy, multimodality

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 8266

Loading