Abstract: Existing Generative face video coding (GFVC) frame-
works enable the ultra-low bandwidth video communication
through transmission of compact facial representations. How-
ever, non-localization of derived facial representations leads
to foreground and background blending (entanglement) in
the decoded sequences, resulting in geometry distortions. In
this work, we propose a GFVC framework that removes this
blending, and suppresses the induced artifacts. To achieve
the disentanglement, the proposed framework 1) separately
transmits foreground and background of the first frame (base
pictures), and 2) performs fusion at the decoder end with
background base picture. Further, the proposed methodology
supports chroma keying for decoder simplification and back-
ground customization. The proposed approach is generic and
builds upon existing GFVC frameworks to generate stable
and consistent video sequences. Compared to VVC, pro-
posed algorithm reduces the average bit rate by 52.13% and
offers a 1.24% average improvement in DISTS over existing
GFVC methods at QP 22. Additionally, subjective evalua-
tions reveal a 88.89% preference for the proposed approach,
in contrast to 11.11% for current best-performing method.
Loading