From Vulnerability to Defense: Understanding and Mitigating MASK-Based Attacks in dLLMs

Zesheng Shi; xue li; Weiyang Guo; Chenrui Dai; Fangming Liu; Min Zhang; Jing Li

From Vulnerability to Defense: Understanding and Mitigating MASK-Based Attacks in dLLMs

Zesheng Shi, xue li, Weiyang Guo, Chenrui Dai, Fangming Liu, Min Zhang, Jing Li

18 Sept 2025 (modified: 08 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safety Alignment, Diffussion Large Language Model

TL;DR: We identify and theoretically explain mask-based jailbreak vulnerabilities in diffusion LLMs, and propose a Reject-MASK defense that reduces attack success from over 90% to single digits while preserving utility.

Abstract: Diffusion large language models extend diffusion process to discrete domains such as text, demonstrating strong performance in many tasks. However, their bidirectional and parallel decoding architecture introduce unique safety risks that bypass existing safeguards. We show that dLLMs are highly vulnerable to **MASK**-based jailbreaks, where adversarial prompts exploit masked tokens to get fluent but unsafe completions. Through rigorous theoretical analysis and formal proofs, we identify margin accumulation and scheduling advantages as fundamental causes of this vulnerability. To address these risks, we introduce a two-stage data synthesis framework together with a Reject-MASK training strategy. Experimental results demonstrate that our approach consistently suppresses attack success rates from above 90\% to nearly single-digit levels, while retaining competitive utility across diverse benchmarks. By grounding defense design in rigorous theoretical analysis, our work not only establishes a principled foundation for the safety of diffusion-based large language models, but also provides a scalable and practical alignment framework that advances their secure deployment in real-world applications.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 10906

Loading