ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Published: 06 May 2026, Last Modified: 06 May 2026CR2@ICRA2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Imitation Learning, Contact-Rich Manipulation, Visual-Force Policy Learning, Slow-Fast Policy Learning
TL;DR: We propose ImplicitRDP, a unified end-to-end visual-force diffusion policy that integrates slow visual planning and fast force control.
Abstract: Contact-rich manipulation requires combining global visual context with local force feedback. We propose ImplicitRDP, an end-to-end visual-force diffusion policy that unifies visual planning and reactive force control in a single network. Our Structural Slow-Fast Learning uses causal attention to process low-frequency visual tokens and high-frequency force tokens for closed-loop control within an action chunk. We further introduce Virtual-target-based Representation Regularization, which predicts a virtual target in the action space to encourage effective use of force feedback and to avoid modality collapse. Experiments on real-world contact-rich tasks show that ImplicitRDP outperforms both vision-only and hierarchical visual-force baselines with a simpler training pipeline.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Video: mp4
Submission Number: 2
Loading