Conditional Variational Autoencoders for Hierarchical B-frame Coding

Zong-Lin Gao, Cheng-Wei Chen, Yi-Chen Yao, Cheng-Yuan Ho, Wen-Hsiao Peng

Published: 2024, Last Modified: 24 Aug 2025ISCAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2024, this paper proposes a learned hierarchical B-frame coding scheme. Most learned video codecs concentrate on P-frame coding for the RGB content, while B-frame coding for the YUV420 content remains largely under-explored. Some early works explore Conditional Augmented Normalizing Flows (CANF) for B-frame coding. However, they suffer from high computational complexity because of stacking multiple variational autoencoders (VAE) and using separate Y and UV codecs. This work aims to develop a lightweight VAE-based B-frame codec in a conditional coding framework. It features (1) extracting multi-scale features for conditional motion and inter-frame coding, (2) performing frame-type adaptive coding for better bit allocation, and (3) a lightweight conditional VAE backbone that encodes YUV420 content by a simple conversion into YUV444 content for joint Y and UV coding. Experimental results confirms its superior compression performance to the CANF-based B-frame codec from the last year’s challenge while having much reduced complexity.

External IDs:dblp:conf/iscas/GaoCYHP24