MaskCRT-B: Masked Conditional Residual Transformer for Learned B-frame Coding

Zong-Lin Gao, Yi-Chen Yao, Kuan-Wei Ho, Yi-Hsin Chen, Wen-Hsiao Peng

Published: 01 Jan 2025, Last Modified: 24 Aug 2025ISCAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2025. Recently, masked conditional residual coding emerged as an attractive alternative to the existing inter-frame coding frameworks, including residual coding, conditional coding, and conditional residual coding. In this work, we propose masked conditional residual B-frame coding, termed MaskCRT-B, for YUV420 videos. It features an asymmetric codec architecture that includes one joint YUV encoder and two separate Y and UV decoders. Moreover, it incorporates a bi-directional adaptive fusion module that refines the bi-directional feature maps to better tackle the prediction of the occluded and dis-occluded regions within the input video. MaskCRT-B presents a significant advancement in learned B-frame coding, outperforming the state-of-the-art conditional B-frame codec from the Grand Challenge at ISCAS 2024.

External IDs:dblp:conf/iscas/GaoYHCP25