AnoDiT: Mask Guided DiT Inpaint Models for Anomaly Images Generation

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image generation, Image Inpaint, Diffusion model, Diffusion Transformer, Industrial Anomaly images Generation, Mask generation.
TL;DR: Mask Guided DiT inpaint Models for Industrial Defect and Anomaly Synthesis
Abstract: Effective training of industrial anomaly detection (AD) models is persistently hindered by the scarcity and limited diversity of real anomaly samples. While generative methods have been proposed to augment anomaly data, they often struggle with a critical trade-off between generation controllability, background fidelity, and the realism of the synthesized anomalies. In this paper, we propose AnoDiT, a novel mask-guided anomaly generation framework that leverages a Diffusion Transformer (DiT) for high-fidelity inpainting. To ensure the perceptual plausibility of generated anomalies, we introduce a Laplacian Pyramid-based Texture Decomposition Module. which guides the model to learn deep texture representations of anomalous regions. Furthermore, for seamless integration of the anomaly into the pristine background, we design an Anomaly Region Focusing mechanism with Edge Weighting, which encourages the model to learn a natural transition at the defect boundary and is enhanced by a multi-round resampling process. To establish a fully automated pipeline and overcome the annotation bottleneck, we also develop a conditional diffusion model incorporating a Positional Prior to generate diverse and realistically-located anomaly masks. This dual-model pipeline not only enables fine-grained control over the anomaly's geometry and texture but also simultaneously yields pixel-perfect labels. Experiments demonstrate that data synthesized by AnoDiT significantly improves the performance of downstream anomaly inspection tasks.
Primary Area: generative models
Submission Number: 8850
Loading