Keywords: Diffusion model, 3D policy, Bi-manual manipulation, Imitation learning
Abstract: We present a conceptually simple and general framework for bi-manual manipulation that extends the state-of-the-art 3D diffusion policy 3D Diffuser Actor, by redefining the robot action in a bi-manual form. The method, called Bi3D Diffuser Actor, uses 3D scene feature representations aggregated from posed camera views and sensed depth, conditions on language instructions, and generates 3D trajectories of the left and right robot end effectors jointly. While most baselines struggle with the complexity of two-hand dynamics, our approach not only effectively manages action multimodality but also generates coordinated and synergistic two-hand motions, even in more challenging scenarios. Bi3D Diffuser Actor, trained in a multi-task setting, establishes a new state-of-the-art on PerAct2, with an absolute performance gain of 42.5% over prior approaches that are trained in single-task settings. We hope our simple yet effective approach will serve as a strong baseline and facilitate further research in bi-manual and dexterous manipulation.
Submission Number: 38
Loading