TEDRA: Text-based Editing of Dynamic and Photoreal Actors

Basavaraj Sunagad; Heming Zhu; Mohit Mendiratta; Adam Kortylewski; Christian Theobalt; Marc Habermann

TEDRA: Text-based Editing of Dynamic and Photoreal Actors

Basavaraj Sunagad, Heming Zhu, Mohit Mendiratta, Adam Kortylewski, Christian Theobalt, Marc Habermann

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural Rendering, Neural (implicit) representations, 3D human body shape modeling, Generative models

Abstract: Over the past years, significant progress was made in creating photorealistic and drivable 3D avatars solely from videos of real humans. However, a core remaining challenge is the fine-grained and user-friendly editing of clothing styles by means of textual descriptions. In particular, text-based edits of full-body avatars should satisfy two properties: 1) Spatio-temporal consistency, i.e. the dynamics, and the photo-real quality of the original avatar, should remain intact; 2) The final result should respect the user-specified edit. To this end, we present TEDRA the first method allowing text-based edits of an avatar, that are photorealistic, space-time coherent, dynamic, and enable skeletal pose and view control. We leverage a pre-trained avatar that is represented as a signed distance and radiance field, which is anchored to an explicit and deformable mesh template. After a pre-training stage, we obtain a drivable and photo-real digital counterpart of the real actor. Specifically, we employ an optimization strategy to integrate various frames capturing distinct camera perspectives and the dynamics of a video performance into a unified diffusion model. Utilizing this personalized diffusion model, we modify the dynamic avatar based on a provided text prompt, introducing the Normal Aligned Identity Preserving Score Distillation Sampling (NAIP-SDS) within a model-based guidance framework. Additionally, we implement a time-step annealing strategy to ensure the high quality of our edits. Our results demonstrate a clear improvement over prior work in terms of functionality and visual quality. Thus, our method is a clear step towards intuitive and photorealistic editability of digital avatars, which explicitly accounts for dynamics and allows skeletal pose and view control at test time.

Supplementary Material: zip

Submission Number: 106

Loading