A Generative Face Video Coding Framework with Disentangled and Consistent Background

Shiv Gehlot, Guan-Ming Su, Peng Yin, Sean McCarthy, Gary J. Sullivan

Published: 17 Aug 2025, Last Modified: 23 Apr 2026IEEE International Conference on Image Processing (ICIP), 2025EveryoneRevisionsCC BY-NC 4.0

Abstract: Existing Generative face video coding (GFVC) frame- works enable the ultra-low bandwidth video communication through transmission of compact facial representations. How- ever, non-localization of derived facial representations leads to foreground and background blending (entanglement) in the decoded sequences, resulting in geometry distortions. In this work, we propose a GFVC framework that removes this blending, and suppresses the induced artifacts. To achieve the disentanglement, the proposed framework 1) separately transmits foreground and background of the first frame (base pictures), and 2) performs fusion at the decoder end with background base picture. Further, the proposed methodology supports chroma keying for decoder simplification and back- ground customization. The proposed approach is generic and builds upon existing GFVC frameworks to generate stable and consistent video sequences. Compared to VVC, pro- posed algorithm reduces the average bit rate by 52.13% and offers a 1.24% average improvement in DISTS over existing GFVC methods at QP 22. Additionally, subjective evalua- tions reveal a 88.89% preference for the proposed approach, in contrast to 11.11% for current best-performing method.