Abstract: While image-based inverse tone mapping (iTM) has been extensively studied, research on video-based iTM remains limited. Leveraging image-based models for video iTM presents two key challenges: (1) incorporating global operations essential for HDR video production, and (2) modeling spatial-temporal information. To address these issues, we propose integrating a kernel prediction network (KPN) with multi-frame interactions (MFI) to model spatial-temporal context. Additionally, we introduce a global color mapping network (GCMN) alongside the KPN to simulate global operations, focusing on SDR pixels near the BT.709 color gamut boundaries. The MFI module refines spatial-temporal consistency by leveraging correlations across frames. Both GCMN and MFI can be seamlessly integrated into existing image-based iTM models to extend them to video iTM. Moreover, we introduce two losses for video iTM: an inter-frame brightness consistency loss based on the Gaussian pyramid, and a differential histogram loss to capture global color distribution. Extensive experiments demonstrate our approach outperforms state-of-the-art methods in both image and video-based iTM.
External IDs:dblp:journals/tce/YanZZC25
Loading