Keywords: Diffusion Models, Reinforcement Learning, Post-Training, Generalist Humanoid Policies
TL;DR: Post-Training Diffusion Models with RL for General Humanoid Loco-Manipulation
Abstract: Building generalist humanoid policies that span diverse whole-body loco-manipulation skills is bottlenecked by exploration: directly searching the high-dimensional action space of a humanoid for long-horizon behaviors is intractable, and tracking-based RL controllers, while reliable executors at high frequency, do not by themselves produce the multi-modal, task-conditioned plans such generality requires. Diffusion models, on the other hand, capture multi-modal motion distributions well but lack grounding in the physical feasibility of a specific embodiment. We exploit this complementarity by placing diffusion at the motion level rather than the action level, structuring exploration of diffusion models over motions while leaving high-frequency control to the tracker: a diffusion generator, pretrained on dynamically feasible retargeted demonstrations, restricts the search space to a learned manifold of plausible motions, while a generalist RL tracking controller realizes them on the robot. To close the residual gap between what the motion generator proposes and what the controller can execute, we further post-train the diffusion model using RL with the tracker rolled out in the loop, finetuning against trackability and kinematic-fidelity rewards. The result is a single pipeline that learns from imperfect kinematic demonstrations and generalizes across loco-manipulation skills, including behaviors unseen at pretraining.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 37
Loading