Keywords: 3D Cloth Representations, Semantic Features, Vision Foundation Models
TL;DR: This study investigates the advantages and challenges of vision foundation models in augmenting 3D representations of cloth-like deformable objects by extracting semantic information from RGB images.
Abstract: This study explores the use of vision foundation models to enhance 3D representations of cloth-like deformable objects. By focusing on the distillation of semantic information from RGB images, we examine the potential of pre-trained Visual-Language Models in capturing complex folded configurations of cloth. Our investigation reveals the challenges and preliminary successes in leveraging semantic information to improve the understanding and tracking of deformable object states.
Submission Number: 22
Loading