Keywords: Human Mesh Recover, Gaussian Splatting, Rendering
TL;DR: We propose a human mesh recovery method that achieves pixel-level alignment.
Abstract: Reconstructing a human mesh from a single in-the-wild image has long been a central research direction in computer vision. Existing approaches often provide only coarse reconstructions of the overall human structure, while still exhibiting noticeable misalignments in fine-grained regions such as the face and hands. Such subtle deviations may be progressively amplified in downstream tasks, leading to significant errors in the final outcomes. To address this issue, we propose PEAR—a unified framework for human mesh recovery and rendering. PEAR explicitly tackles two major limitations of current methods: inaccurate localization of fine-grained human pose details and insufficient photometric supervision for self-reconstruction.
Specifically, we train a transformer-based model to recover expressive 3D human geometry from a single 2D image, and integrate it with a neural renderer to jointly optimize geometry and appearance. This synergy substantially improves the accuracy of fine-grained human geometry while yielding higher-quality rendering results. In addition, we construct a large-scale dataset of images and videos with human annotations to support model training. Extensive experiments on multiple benchmark datasets demonstrate that the proposed approach achieves significant improvements in both geometric reconstruction accuracy and rendering quality.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4457
Loading