Expressive Gaussian Human Avatars from Monocular RGB Video

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: expressiveness, human avatar, monocular RGB video
Abstract: Nuanced expressiveness, especially through detailed hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we aim to learn expressive human avatars from a monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduce EVA, a drivable human model that can recover fine details based on 3D Gaussians and an expressive parametric human model, SMPL-X. Focused on enhancing expressiveness, our work makes three key contributions. First, we highlight the importance of aligning the SMPL-X model with the video frames for effective avatar learning. Recognizing the limitations of current methods for estimating SMPL-X parameters from in-the-wild videos, we introduce a reconstruction module that significantly improves the image-model alignment. Second, we propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds to accommodate the varied granularity across body parts. Third, we develop a feedback mechanism that predicts per-pixel confidence to better guide the optimization of 3D Gaussians. Extensive experiments on two benchmarks demonstrate the superiority of our approach both quantitatively and qualitatively, especially on the fine-grained hand and facial details. We make our code available at the project website: https://evahuman.github.io.
Primary Area: Machine vision
Flagged For Ethics Review: true
Submission Number: 3671
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview