Joker: Conditional 3D Head Synthesis With Extreme Facial Expressions

Malte Prinzler; Egor Zakharov; Vanessa Skliarova; Berna Kabadayi; Justus Thies

Joker: Conditional 3D Head Synthesis With Extreme Facial Expressions

Malte Prinzler, Egor Zakharov, Vanessa Skliarova, Berna Kabadayi, Justus Thies

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Head Avatars, Expression, Diffusion Models, 3D Distillation

TL;DR: Based on a single portrait image of a person, we synthesize a volumetric representation of this person under an extreme expression using text and 3DMM controls.

Abstract: We introduce Joker, a new method for the conditional synthesis of 3D human heads with extreme expressions. Given a single reference image of a person, we synthesize a volumetric human head with the reference’s identity and a new expression. We offer control over the expression via a 3D morphable model (3DMM) and textual inputs. This multi-modal conditioning signal is essential since 3DMMs alone fail to define subtle emotional changes and extreme expressions, including those involving the mouth cavity and tongue articulation. Our method is built upon a 2D diffusion-based prior that generalizes well to out-of-domain samples, such as sculptures, heavy makeup, and paintings while achieving high levels of expressiveness. To improve view consistency, we propose a new 3D distillation technique that converts predictions of our 2D prior into a neural radiance field (NeRF). Both the 2D prior and our distillation technique produce state-of-the-art results, which are confirmed by our extensive evaluations. Also, to the best of our knowledge, our method is the first to achieve view-consistent extreme tongue articulation.

Supplementary Material: zip

Submission Number: 383

Loading