Keywords: Imitation Learning, Trajectory Constrainted Task, Multi-modality, Force Safety
Abstract: In unstructured environments, robotic manipulation tasks involving objects with constrained motion trajectories—such as door opening—often experience discrepancies between the robot's vision-guided end-effector trajectory and the object's constrained motion path.
Such discrepancies generate unintended harmful forces, which, if exacerbated, may lead to task failure and potential damage to the manipulated objects or the robot itself. To address this issue, this paper introduces a novel diffusion framework, termed SafeDiff. Unlike conventional methods that sequentially fuse visual and tactile data to predict future robot states, our approach generates a prospective state sequence based on the current robot state and visual context observations, using real-time force feedback as a calibration signal.
This implicitly adjusts the robot’s state within the state space, enhancing operational success rates and significantly reducing harmful forces during manipulation, thus ensuring manipulation force safety. Additionally, we develop a large-scale simulation dataset named SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening task, across both simulated and real-world settings.
Spotlight: mp4
Submission Number: 413
Loading