Expressive Gaussian Human Avatars from Monocular RGB Video

Hezhen Hu; Zhiwen Fan; Tianhao Walter Wu; Yihan Xi; Seoyoung Lee; Georgios Pavlakos; Zhangyang Wang

Expressive Gaussian Human Avatars from Monocular RGB Video

Hezhen Hu, Zhiwen Fan, Tianhao Walter Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: expressiveness, human avatar, monocular RGB video

Abstract: Nuanced expressiveness, especially through detailed hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we aim to learn expressive human avatars from a monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduce EVA, a drivable human model that can recover fine details based on 3D Gaussians and an expressive parametric human model, SMPL-X. Focused on enhancing expressiveness, our work makes three key contributions. First, we highlight the importance of aligning the SMPL-X model with the video frames for effective avatar learning. Recognizing the limitations of current methods for estimating SMPL-X parameters from in-the-wild videos, we introduce a reconstruction module that significantly improves the image-model alignment. Second, we propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds to accommodate the varied granularity across body parts. Third, we develop a feedback mechanism that predicts per-pixel confidence to better guide the optimization of 3D Gaussians. Extensive experiments on two benchmarks demonstrate the superiority of our approach both quantitatively and qualitatively, especially on the fine-grained hand and facial details. We make our code available at the project website: https://evahuman.github.io.

Primary Area: Machine vision

Flagged For Ethics Review: true

Submission Number: 3671

Loading