Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: robot policy learning, offline reinforcement learning, whole-body control
Abstract: Scaling imitation learning to high-DoF whole-body robots is fundamentally constrained by the scarcity of expert demonstrations. In contrast, large amounts of suboptimal data are readily available and offer a practical way to alleviate supervision bottlenecks in real-world whole-body control. However, leveraging such data introduces two central challenges: how to extract informative signals from imperfect trajectories, and how to cope with the increased learning complexity induced by high-dimensional control. To overcome this, we propose **HVD** (Hierarchical Value-Decomposed Offline Reinforcement Learning). The offline RL formulation provides principled data selection over suboptimal datasets, enabling the policy to prioritize high-value behaviors while down-weighting harmful ones. Complementarily, hierarchical value decomposition organizes learning along the robot’s kinematic structure, improving credit assignment and reducing learning complexity in high-DoF systems. Built on a Transformer-based architecture, HVD supports *multi-modal* and *multi-task* learning, allowing flexible integration of diverse sensory inputs. To enable realistic evaluation and training, we further introduce **WB-50**, a 50-hour dataset of teleoperated and policy rollout trajectories annotated with rewards and preserving natural imperfections, including partial successes, corrections, and failures. Experiments show HVD significantly outperforms existing baselines in success rate across complex whole-body tasks. Our results suggest effective policy learning for high-DoF systems can emerge not from perfect demonstrations, but from structured learning over realistic, imperfect data. Our code is available at https://github.com/LAMDA-RL/HVD.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 18935
Loading