MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fine-grained video color enhancement delivers superior visual results by making precise adjustments to specific areas of the frame, maintaining more natural color relationships compared to global enhancement techniques. However, dynamically applying these specific enhancements can lead to flickering artifacts and unsatisfactory color blending at object boundaries, issues caused by the coarse and unstable masks produced by current video segmentation algorithms. To overcome these challenges, we introduce MovingColor, featuring a novel self-supervised training approach that leverages large-scale video datasets. This approach redefines color fusion as a generation process using original full-frame textures and color editing information from non-edge areas. We address spatio-temporal inconsistencies with a spectral-spatial hybrid encoder that captures multi-scale spatial and frequency features, thus enhancing color adjustments in complex scenes. Additionally, our global-local feature propagation module, incorporating Transformer blocks, consolidates spatio-temporal contexts to ensure consistency among frames. Both quantitative and subjective evaluations validate the effectiveness of MovingColor in delivering state-of-the-art spatio-temporal consistency for video color enhancements, adhering closely to the intended color editing operations. These results demonstrate that MovingColor can effectively enhance fine-grained video color grading, making it more efficient and accessible to a wider range of users. We will release the code to support further research and practical applications.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Generation] Generative Multimedia
Relevance To Conference: This work exemplifies the MM conference's focus on advancing inherently multimedia and multimodal research through its novel MovingColor approach for fine-grained video color enhancement. By leveraging large-scale unlabeled video datasets and introducing a self-supervised learning method, MovingColor addresses key challenges in video color editing that arise from the multimodal nature of the task, such as spatial and temporal inconsistencies caused by imprecise and unstable segmentation masks. The proposed spectral-spatial hybrid encoder and global-local feature propagation module demonstrate the importance of multimodal processing in video color enhancement. These components enable MovingColor to capture and integrate multi-scale features from both the spatial and frequency domains, as well as aggregate spatio-temporal contexts across video frames. This multimodal approach is crucial for ensuring spatially and temporally consistent color fusion results that preserve the intended color edits while maintaining natural transitions and textures. Moreover, the introduction of the D5 dataset, featuring diverse 4K video clips with ground truth mattings, highlights the need for comprehensive evaluation of color fusion performance in a multimedia context. By advancing techniques for fine-grained video color enhancement and introducing a valuable dataset for evaluation, this work makes significant contributions to the field of multimedia processing, aligning strongly with the MM conference's mission to promote inherently multimedia and multimodal research.
Supplementary Material: zip
Submission Number: 279
Loading