Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation

Prashanth Ravichandar; Lokesh Krishna; Nikhil Sobanbabu; Quan Nguyen

Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation

Prashanth Ravichandar, Lokesh Krishna, Nikhil Sobanbabu, Quan Nguyen

Published: 15 Jun 2025, Last Modified: 25 Jun 2025RSS 2025 Workshop WCBM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Whole-body Control, Loco-Manipulation, Reinforcement Learning

Abstract: Dynamic loco-manipulation requires whole-body control and contact-rich interactions with objects and the environment. Existing learning-based approaches train a high-level policy or hand-design a finite state machine to switch between low-level skill policies trained independently, often resulting in quasi-static behaviors. We propose Preferenced Oracle Guided Multi-mode Policies to learn a single policy that masters multiple modes and their preferred sequences of transitions. Designing hybrid automatons as oracles to generate references over different control modes, policy optimization is guided through bounded exploration. With a task-agnostic preference reward, we enforce learning a desired sequence of mode transitions, thereby solving the task effectively. In uni-object loco-manipulation tasks like omnidirectional box moving and soccer, our approach results in whole-body control and smooth transitions, enabling contact-rich dribbling, goal kicks, and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task across diverse robot morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights. Project website: [https://indweller.github.io/ogmplm/](https://indweller.github.io/ogmplm/)

Submission Number: 12

Loading