Abstract: Video colorization poses challenging tasks, necessitating structural stability, continuity, and details control in the colors produced. In this paper, based on a pretrained text-to-image model, we introduce the $\textbf{Gated Color Guidance}$ module ($\textbf{GCG}$), enabling the model to adaptively perform color propagation or generation according to the structural differences between reference and grayscale frames. Based on this multifunctionality, we propose a novel two-stage coloring strategy. In the first stage, under reference-mask condition, the model autonomously and jointly colors input keyframes in a one-to-many color domain mapping, while temporal coherence constraints are emphasized by modifying the attention mechanism. In the second stage, under reference-guided condition, the model effectively captures the colors of matching structures in the reference, and we further introduce $\textbf{Sliding Reference Grid}$ strategy ($\textbf{SRG}$) to merge and extract the color features from multiple frames, providing more stable coloring for the grayscale frames. Through this pipeline, we can achieve high-quality and stable video coloring while maintaining the accuracy of detailed colors. Additionally, the two-stage strategy is flexible and detachable, allowing users to adjust the number of selected reference frames to balance coloring quality and efficiency. Extensive experiments demonstrate that our method significantly outperforms previous state-of-the-art models in both qualitative comparison and quantitative measurement.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: In the task of video colorization, generating continuous and semantically consistent colors remains challenging. We extensively explore how to enable the model to perform high-quality colorization under various conditions while maintaining detail stability throughout the coloring process. And our work offers a practical solution for video colorization and showcases the potential of image diffusion models in this domain.
Supplementary Material: zip
Submission Number: 3530
Loading