Abstract: Highlights•We propose a generalized semantic-aware hyper-space deformable NeRF-based framework for reconstructing high-fidelity facial avatars from monocular videos, which can be driven by either 3DMM coefficients or audio input.•We introduce a novel hyper-space deformation module that transforms the observation space coordinates to the canonical hyper-space coordinates, which can capture both local and global facial dynamics.•Extensive experiments show that the proposed framework outperforms the existing state-of-the-art methods.
Loading