Abstract: Temporal Automatic White Balance (TAWB) corrects the color cast within each frame, while ensuring consistent illumination across consecutive frames. Unlike conventional AWB, there has been limited research conducted on TAWB for an extended period. However, the growing popularity of short-form videos has increased focus on video color experiences. To further advance research on TAWB, we aim to address the bottlenecks associated with datasets, models, and benchmarks. 1) Dataset challenge: Currently, only one TAWB dataset (BCC), captured with a single camera, is available. It lacks temporal continuity due to challenges in capturing realistic illuminations and dynamic raw data. In response, we meticulously designed an acquisition strategy based on the actual distribution pattern of illuminations and created a comprehensive TAWB dataset named CTA comprising 6 cameras that offer 12K continuous illuminations. Furthermore, we employed video frame interpolation techniques, extending the captured static raw data into dynamic form and ensuring continuous illumination. 2) Model challenge: Among the two prevailing TAWB methods, both rely on LSTM. However, the fixed gating mechanism of LSTM often fails to adapt to varying content or illuminations, resulting in unstable illumination estimation. In response, we propose CTANet, which integrates cross-frame attention and RepViT for self-adjustment to content and illumination variations. Additionally, the mobile-friendly design of RepViT enhances the portability of CTANet. 3) Benchmark challenge: Currently, there is no benchmark of TAWB methods on illumination and camera types to date. Addressing this, a benchmark has been proposed by conducting a comparative analysis of 8 cutting-edge AWB and TAWB methods with CTANet across 3 typical illumination scenes and 7 cameras from two representative datasets. Our dataset and code are available in supplementary material.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: 1.This paper focuses on temporal automatic white balance (TAWB) in videos, including in-depth analysis and improvement of datasets, methods and benchmarks.
2. Relevance to multimedia: The proposed TAWB method for images and videos adjusts their color deviations to produce more realistic and natural picture colors, thereby enhancing the visual quality of the image or video content. This is crucial for multimedia applications that rely on visual content.
3. Relevance to multimodality: The proposed dataset is annotated in 10 dimensions, including content and color description, time, location, weather during shooting, color temperature and type of illumination, type of shooting scene, as well as brand and model of the devices.
4. Contribution to multimedia and multimodal processing: This paper introduces a TAWB method designed to prevent flickering in corrected videos, by integrating cross-frame attention and RepViT to adapt to content and illumination variations. Additionally, the mobile-friendly parameter quantity enhances the portability of the proposed method. Moreover, this paper proposed the most extensive dataset to date, covering both video and image content, which substantially advances the research frontier of TAWB.
5. This paper reviews and references 10 MM/TMM papers within the multimedia field.
Supplementary Material: zip
Submission Number: 3823
Loading