Abstract: As video conferencing becomes an indispensable part of human’s daliy life, how to achieve a high-fidelity calling experience under low bandwidth has been a popular and challenging issue. Deep generative models have great potential in low-bandwidth facial video compression due to the excellent generation capability based on abridged information. Nevertheless, exsiting deep generation-based compression methods tend to handle motion information in pure 2D or pseudo 3D space, causing facial distortion when large head poses are encountered. In this paper, we propose a 3D-aware high-fidelity facial video conferencing system based on a parameterized NeRF-based face model. Through the compression of the parameterized face model and the transmisstion of extracted facial parameters, we implement high-fidelity talking head synthesis for video conferencing at an ultra-low bitrate. Additionally, the 3D perception capability of the system allows for viewpoint control over the head, achieving higher interactivity and practicability. Extensive experiments verify the effectiveness of the proposed 3D-aware high-fidelity free-view facial video conferencing system.
Loading