Keywords: Humanoid Robots, Reinforcement Learning, Whole-body Control
TL;DR: We decouple humanoid control into high-frequency arm and low-frequency leg policies, enabling stable end-effector stable control during dynamic locomotion.
Abstract: Can your humanoid walk up and hand you a full cup of beer—without spilling a drop?
While humanoids are increasingly featured in flashy demos—dancing, delivering packages, traversing rough terrain—fine-grained control during locomotion remains a significant challenge. In particular, stabilizing a filled end-effector (EE) while walking is far from solved, due to a fundamental mismatch in task dynamics: locomotion demands slow-timescale, robust control, whereas EE stabilization requires rapid, high-precision corrections. To address this, we propose SoFTA , a Slow-Fast Two-Agent framework that decouples upper-body and lower-body control into separate agents operating at different frequencies and with distinct rewards. This temporal and objective separation mitigates policy interference and enables coordinated whole-body behavior. SoFTA executes upper-body actions at 100 Hz for precise EE control and lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x to baselines and performs much closer to human-level stability, enabling delicate tasks such as carrying nearly full cups, capturing steady video during locomotion, and disturbance rejection with EE stability. We validate SoFTA on both Unitree G1 and Booster T1, showing strong cross-platform generalizatin.
Submission Number: 18
Loading