Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile Affordance

Neha Sunil; Megha Tippur; Arnau Saumell Portillo; Edward H Adelson; Alberto Rodriguez Garcia

Reactive In-Air Clothing Manipulation with Confidence-Aware Dense Correspondence and Visuotactile Affordance

Neha Sunil, Megha Tippur, Arnau Saumell Portillo, Edward H Adelson, Alberto Rodriguez Garcia

Published: 08 Aug 2025, Last Modified: 16 Sept 2025CoRL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deformable Object Manipulation, Dense Correspondence Learning, Confidence-Aware Planning, Visuotactile Perception

TL;DR: We develop a visuotactile system capable of folding and hanging in-air using dense visual representations and tactilely-supervised visual affordance networks.

Abstract: Manipulating clothing is challenging due to their complex, variable configurations and frequent self-occlusion. While prior systems often rely on flattening garments, humans routinely identify keypoints in highly crumpled and suspended states. We present a novel, task-agnostic, visuotactile framework that operates directly on crumpled clothing—including in-air configurations that have not been addressed before. Our approach combines global visual perception with local tactile feedback to enable robust, reactive manipulation. We train dense visual descriptors on a custom simulated dataset using a distributional loss that captures cloth symmetries and generates correspondence confidence estimates. These estimates guide a reactive state machine that dynamically selects between folding strategies based on perceptual uncertainty. In parallel, we train a visuotactile grasp affordance network using high-resolution tactile feedback to supervise grasp success. The same tactile classifier is used during execution for real-time grasp validation. Together, these components enable a reactive, task-agnostic framework for in-air garment manipulation, including folding and hanging tasks. Moreover, our dense descriptors serve as a versatile intermediate representation for other planning modalities, such as extracting grasp targets from human video demonstrations, paving the way for more generalizable and scalable garment manipulation.

Supplementary Material: pdf

Submission Number: 871

Loading