Efficient Backdoor Detection on Text-to-image Synthesis via Neuron Activation Variation

Published: 06 Mar 2025, Last Modified: 20 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Security, Text-to-image Synthesis, Backdoor Defenses, Diffusion Models
TL;DR: This paper proposes NaviDet, an input-level backdoor detection framework on text-to-image synthesis.
Abstract: In recent years, text-to-image (T2I) diffusion models have garnered significant attention for their ability to generate high-quality images reflecting text prompts. However, their growing popularity has also led to the emergence of backdoor threats, posing substantial risks. Currently, effective defense strategies against such threats are lacking due to the diversity of backdoor targets in T2I synthesis. In this paper, we propose $\textbf{NaviDet}$, the first general input-level backdoor detection framework for identifying backdoor inputs across various backdoor targets. Our approach is based on the new observation that trigger tokens tend to induce significant neuron activation variation in the early stage of the diffusion generation process, a phenomenon we term $\textbf{Early-step Activation Variation}$. Leveraging this insight, $\textbf{NaviDet}$ detects malicious samples by analyzing neuron activation variations caused by input tokens. Extensive experiments demonstrate the effectiveness and efficiency of \shortNew against various T2I backdoors surpassing the baselines.
Submission Number: 104
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview