Abstract: Fine-grained video color enhancement delivers superior visual results by making precise adjustments to specific areas of the frame, maintaining more natural color relationships compared to global enhancement techniques. However, dynamically applying these specific enhancements can lead to flickering artifacts and unsatisfactory color blending at object boundaries, issues caused by the coarse and unstable masks produced by current video segmentation algorithms. To overcome these challenges, we introduce MovingColor, featuring a novel self-supervised training approach that leverages large-scale video datasets. This approach redefines color fusion as a generation process using original full-frame textures and color editing information from non-edge areas. We address spatio-temporal inconsistencies with a spectral-spatial hybrid encoder that captures multi-scale spatial and frequency features, thus enhancing color adjustments in complex scenes. Additionally, our global-local feature propagation module, incorporating Transformer blocks, consolidates spatio-temporal contexts to ensure consistency among frames. Both quantitative and subjective evaluations validate the effectiveness of MovingColor in delivering state-of-the-art spatio-temporal consistency for video color enhancements, adhering closely to the intended color editing operations. These results demonstrate that MovingColor can effectively enhance fine-grained video color grading, making it more efficient and accessible to a wider range of users. Please refer to the project page for code and models: https://yidong.pro/projects/movingcolor/.
Loading