SMAGA: Secondary Motion-Aware 3D Clothed Gaussian Avatars from Monocular Videos

Seungeun Lee; SeungJun Moon; Hah Min Lew; Ji-Su Kang; Gyeong-Moon Park

SMAGA: Secondary Motion-Aware 3D Clothed Gaussian Avatars from Monocular Videos

Seungeun Lee, SeungJun Moon, Hah Min Lew, Ji-Su Kang, Gyeong-Moon Park

Published: 26 Jan 2026, Last Modified: 18 May 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Computer Vision, Neural Rendering, 3D Avatar Modeling

TL;DR: We propose a 3D Gaussian splatting avatar method that realistically models loose clothing and dynamic appearances from monocular videos, overcoming the limitations of template-based approaches.

Abstract: Recent advances in neural rendering, particularly 3D Gaussian Splatting (3DGS), have enabled animatable 3D human avatars from single videos with efficient rendering and high fidelity. However, current methods struggle with dynamic appearances, especially in loose garments (e.g., skirts), causing unrealistic cloth motion and needle artifacts. This paper introduces a novel approach to dynamic appearance modeling for 3DGS-based avatars, focusing on loose clothing. We identify two key challenges: (1) limited Gaussian deformation under pre-defined template articulation, and (2) a mismatch between body-template assumptions and the geometry of loose apparel. To address these issues, we propose a motion-aware autoregressive structural deformation framework for Gaussians. We structure Gaussians into an approximate graph and recursively predict structure-preserving updates, yielding realistic, template-free cloth dynamics. Our framework enables robust dynamic appearance modeling under the single-view constraint, producing accurate foreground silhouettes and precise alignment of Gaussian points with clothed shapes. To demonstrate the effectiveness of our method, we introduce an evaluation dataset featuring subjects performing dynamic movements in loose clothing, and extensive experiments validate that our approach significantly outperforms existing 3DGS-based methods in modeling dynamic appearances from monocular videos.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 15677

Loading