Keywords: Offline RL, Whole Body Control, Mobile Manipulation
TL;DR: Offline RL for learning mobile manipulation using sub-optimal synthetic demonstrations without any teleoperation data
Abstract: Whole-body Mobile Manipulation (MoMa) of articulated objects -- e.g., opening doors, drawers, and cupboards -- demands simultaneous coordination of a robot's base and arms. Classical Whole-Body Controllers (WBCs) solve this via hierarchical optimization but require extensive tuning and remain brittle, while learning-based methods rely on expensive whole-body teleoperation data or heavy reward engineering. We observe that even a sub-optimal WBC is a powerful structural prior: it collects data in a constrained, task-relevant region of the state-action space, and its behavior can still be improved using offline RL. We propose WHOLE-MoMa, a two-stage pipeline that first generates diverse demonstrations by randomizing a lightweight WBC, and then applies offline RL to identify and stitch together improved behaviors via a reward signal. To support expressive action-chunked Diffusion Policies, we extend offline IQL with Q-chunking for chunk-level critic evaluation and advantage-weighted policy extraction. On three tasks of increasing difficulty with a TIAGo++ mobile manipulator, WHOLE-MoMa outperforms WBC, behavior cloning, and several offline RL baselines, and transfers directly to the real robot without teleoperated or real-world training data, achieving 80% success on bimanual drawer manipulation and 68% on simultaneous cupboard opening and object placement.
Submission Number: 11
Loading