Abstract: Creating high-quality 3D models of clothed humans from single images for real-world applications is crucial. De-spite recent advancements, accurately reconstructing hu-mans in complex poses or with loose clothing from in-the-wild images, along with predicting textures for unseen areas, remains a significant challenge. A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction. In response, we introduce SIFU (Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction), a novel approach combining a Side-view Decoupling Transformer with a 3D Consistent Texture Re-finement pipeline. SIFU employs a cross-attention mech-anism within the transformer, using SMPL-X normals as queries to effectively decouple side-view features in the process of mapping 2D features to 3D. This method not only improves the precision of the 3D models but also their ro-bustness, especially when SMPL-X estimates are not per-fect. Our texture refinement process leverages text-to-image diffusion-based prior to generate realistic and consistent textures for invisible views. Through extensive experiments, SIFU surpasses SOTA methods in both geometry and texture reconstruction, showcasing enhanced robustness in com-plex scenarios and achieving an unprecedented Chamfer and P2S measurement. Our approach extends to practi-cal applications such as 3D printing and scene building, demonstrating its broad utility in real-world scenarios.
External IDs:dblp:conf/cvpr/ZhangYY24
Loading