Abstract: Reconstructing garments from monocular videos has attracted considerable attention as it provides a convenient and low-cost solution for clothing digitization. In reality, people wear clothing with countless variations and multiple layers. Existing studies attempt to extract garments from a single video. They either behave poorly in generalization due to reliance on limited clothing templates or struggle to handle the intersections of multi-layered clothing leading to the lack of physical plausibility. Besides, there are inevitable and undetectable overlaps for a single video that hinder researchers from modeling complete and intersection-free multi-layered clothing. To address the above limitations, in this paper, we propose a novel method to reconstruct multi-layered clothing from multiple monocular videos sequentially, which surpasses existing work in generalization and robustness against penetration. For each video, neural fields are employed to implicitly represent the clothed body, from which the meshes with frame-consistent structures are explicitly extracted. Next, we implement a template-free method for extracting a single garment by back-projecting the image segmentation labels of different frames onto these meshes. In this way, multiple garments can be obtained from these monocular videos and then aligned to form the whole outfit. However, intersection always occurs due to overlapping deformation in the real world and perceptual errors for monocular videos. To this end, we innovatively introduce a physics-aware module that combines neural fields with a position-based simulation framework to fine-tune the penetrating vertices of garments, ensuring robustly intersection-free. Additionally, we collect a mini dataset with fashionable garments to evaluate the quality of clothing reconstruction comprehensively. The code and data will be open-sourced if this work is accepted.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Experience] Multimedia Applications, [Experience] Interactions and Quality of Experience
Relevance To Conference: This paper aims to reconstruct multi-layered clothing from monocular videos, enabling cost-effective and convenient digitalization of garments.
Our work holds significant practical implications for multimedia applications such as gaming, metaverse, and virtual reality.
We propose a state-of-the-art solution by combining methods of neural fields and physics simulation.
The main contribution includes a template-free approach with great generalization for extracting a complete garment, a physics-aware module ensuring multi-layered clothing intersection-free with excellent robustness and quality, and a small multimedia dataset collected from both physical simulation and the real world.
We believe IF-Garments is bound to benefit downstream multimedia tasks such as human performance capture, personalized avatar modeling, and virtual try-on.
Supplementary Material: zip
Submission Number: 3083
Loading