Abstract: Color plays a vital role in our perception and interaction with the world, particularly in the realm of video where its presence or absence holds significant importance. Consequently, the process of generating accurate representations of color in videos, where original color information is absent, is a crucial area of research. Despite recent attention to this subject, there remain several unresolved issues pertaining to enhancing result quality and exploring novel approaches. To address these gaps, we propose Swin Flow Video Colorization (SwinFVC), a framework that leverages state-of-the-art computer vision techniques, such as Swin Transformers, in conjunction with classical convolutional neural networks (CNNs) to extract color information from reference images. Additionally, we introduce the Flow Color encoding block, a subnetwork responsible for generating feature representations that effectively capture color dynamics across images and video streams. Our implementation was trained using the DAVIS and LDV datasets, and the results exhibited superior effectiveness compared to current state-of-the-art methods, as evidenced by the Fréchet Inception Distance (FID) and Color Distribution Consistency (CDC) metrics.
External IDs:dblp:conf/icmla/StivalTP23
Loading