DMID:Dynamic Mask Attention for High-Fidelity Identity Preservation under Limited Data

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic Mask, Attention, High-Fidelity Identity
TL;DR: DMID: Under limited data, dynamic mask attention meticulously preserves fine-grained identity features and markedly alleviates textual-identity conflicts, achieving a new state-of-the-art for high-fidelity identity preservation.
Abstract: We present Dynamic Mask Attention for High-Fidelity Identity Preservation under Limited Data (DMID), which aims to precisely reconstruct fine-grained identity features under scarce data conditions while alleviating conflicts between textual and conditional semantics. At its core, DMID employs a Variational Autoencoder (VAE) for meticulous identity encoding and introduces a \textbf{Dynamic Attention Mask mechanism}, coupled with \textbf{Distribution Consistency Loss} and \textbf{Identity Mask Loss}, ensuring identity fidelity while mitigating semantic conflicts. To further reduce annotation and training costs, we have designed an efficient data construction pipeline. Furthermore, our method enables the dynamic adjustment of the \textbf{AttnMask strength factor} during inference, ensuring precise modifications and fine-grained control over identity features and semantics across various scenarios. The training process is divided into three stages: (1) identity embedding stage, (2) dynamic attention mask learning stage, and (3) Diffusion-DPO post-training stage. Evaluated on our newly constructed ID Benchmark, DMID achieves state-of-the-art performance in both identity consistency and textual semantics, demonstrating its strong competitiveness in data-limited scenarios. Among them, the parameter count of AttnMaskNet is only approximately 1\%\ of that of Flux.1-dev.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 10831
Loading