Dreaming the Sound of Contact: Leveraging Video and Audio Generation for Zero-Shot Force-Aware Manipulation
Keywords: Force-aware manipulation, video and audio generation
TL;DR: We use video generation models with generated audio as the contact force proxy, enabling robots to perform force-sensitive tasks.
Abstract: Recent advances in video generation enable learning robot manipulation trajectories from generated videos. However, these approaches produce purely kinematic trajectories that lack force information, leading to failure in contact-rich tasks where appropriate contact forces are essential for success. Generated audio carries a complementary and underexplored signal: contact sounds encode force dynamics
that video alone cannot capture. We present a pipeline that jointly leverages generated video and audio to recover both motion trajectories and contact force profiles from a single task description. We execute these force-aware trajectories on a Franka Panda robot using a closed-loop force regulator that tracks the audio-derived force profile during contact. Real-robot experiments on whiteboard wiping and vegetable peeling demonstrate that our force-aware pipeline enables successful contact-rich manipulation from video generation, where a kinematic-only baseline fails. Project website + Videos: https://dreamingcontact.github.io/.
Submission Number: 42
Loading