Chat-Driven 3D Human Pose and Shape Editing with Large Language Models

Published: 2025, Last Modified: 07 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Generating and creating humanoid 3D models has received increasing attention recently due to its fundamental support for many high-level 3D applications. Although automatic 3D pose and shape reconstruction methods have achieved promising results, there are still some failure cases due to self-occlusions, viewpoint changes, and the complexity of human pose articulations. In this paper, we propose a novel way to leverage Large Language Models (LLMs) to interactively reconstruct human pose and shape based on a Skinned Multi-Person Linear (SMPL) model. We construct a mapping table to fine-tune an LLM, enabling it to understand user inputs better and output the positional information of joint points. Additionally, a simple neural network is adopted to regress the shape cues of the SMPL. We demonstrate a gallery of results of numerous poses and shapes. We validate our method via numerical evaluations, user studies, and comparisons to manually posed characters and previous work.
Loading