Joker: Conditional 3D Head Synthesis With Extreme Facial Expressions

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Head Avatars, Expression, Diffusion Models, 3D Distillation
TL;DR: Based on a single portrait image of a person, we synthesize a volumetric representation of this person under an extreme expression using text and 3DMM controls.
Abstract: We introduce Joker, a new method for the conditional synthesis of 3D human heads with extreme expressions. Given a single reference image of a person, we synthesize a volumetric human head with the reference’s identity and a new expression. We offer control over the expression via a 3D morphable model (3DMM) and textual inputs. This multi-modal conditioning signal is essential since 3DMMs alone fail to define subtle emotional changes and extreme expressions, including those involving the mouth cavity and tongue articulation. Our method is built upon a 2D diffusion-based prior that generalizes well to out-of-domain samples, such as sculptures, heavy makeup, and paintings while achieving high levels of expressiveness. To improve view consistency, we propose a new 3D distillation technique that converts predictions of our 2D prior into a neural radiance field (NeRF). Both the 2D prior and our distillation technique produce state-of-the-art results, which are confirmed by our extensive evaluations. Also, to the best of our knowledge, our method is the first to achieve view-consistent extreme tongue articulation.
Supplementary Material: zip
Submission Number: 383
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview