DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Published: 01 Jan 2024, Last Modified: 05 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person, offering intuitive control over both pose and expression. We propose a diffusion-based neural ren-derer that leverages generic 2D priors to produce com-pelling images of faces. For coarse guidance of the expression and head pose, we render a neural parametric head model (NPHM) from the target viewpoint, which acts as a proxy geometry of the person. Additionally, to enhance the modeling of intricate facial expressions, we condition DiffusionAvatars directly on the expression codes obtained from NPHM via cross-attention. Finally, to synthesize consistent surface details across different viewpoints and ex-pressions, we rig learnable spatial features to the head's surface via TriPlane lookup in NPHM's canonical space. We train DiffusionAvatars on RGB videos and corresponding fitted NPHM meshes of a person and test the obtained avatars in both self-reenactment and animation scenarios. Our experiments demonstrate that DiffusionAvatars gener-ates temporally consistent and visually appealing videos for novel poses and expressions of a person, outperforming ex-isting approaches.
Loading