SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Models, Noise Scheduling, Perceptual Consistency, Reference-guided Generation, Conditional Diffusion Models, Generative Modeling, Sketch-to-Image Translation, Structural Similarity Index
TL;DR: We propose SSIMBaD, a diffusion-based anime sketch colorization method with SSIM-guided noise scaling. It ensures perceptual consistency across steps, leading to better structural fidelity and style transfer than prior models.
Abstract: We propose a novel diffusion-based framework for automatic colorization of Anime-style facial sketches, which preserves the structural fidelity of the input sketch while effectively transferring stylistic attributes from a reference image. Our approach builds upon recent continuous-time diffusion models, but departs from traditional methods that rely on predefined noise schedules, which often fail to maintain perceptual consistency across the generative trajectory. To address this, we introduce SSIMBaD (Sigma Scaling with SSIM-Guided Balanced Diffusion), a sigma-space transformation that ensures linear alignment of perceptual degradation, as measured by structural similarity. This perceptual scaling enforces uniform visual difficulty across timesteps, enabling more balanced and faithful reconstructions. On a large-scale Anime face dataset, SSIMBaD attains state-of-the-art structural fidelity and strong perceptual quality, with robust generalization to diverse styles and structural variations.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 26501
Loading