EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
Camera Ready Pdf: pdf
Keywords: Contact-rich task, Imitation learning, Compliant control, Geometric control, SE(3)-Equivariance
TL;DR: We propose EquiContact, a provably SE(3)-equivariant from vision-to-force for spatially generalizable contact-rich tasks.
Abstract: This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations.
We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations.
We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame.
Through these design choices, we show that the entire EquiContact pipeline is $SE(3)$-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos are attached as multimedia materials.
Supplementary Material: zip
Submission Number: 3
Loading