Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids

Kaizhe Hu; Haochen Shi; Weizhuo Wang; Yao He; Karen Liu; Shuran Song

Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids

Kaizhe Hu, Haochen Shi, Weizhuo Wang, Yao He, Karen Liu, Shuran Song

Published: 18 Jun 2025, Last Modified: 18 Jun 2025RSS 2025 Hardware Intelligence OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Humanoid Robots, Sim-to-Real Adaptation, Real-World RL

TL;DR: Robot-Trains-Robot uses a robot arm teacher to actively train a humanoid student, aiming for practical and highly efficient real-world humanoid adaptation and learning.

Abstract: Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or starting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world training, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid student robot. The RTR system provides protection, schedule, reward, perturbation, failure detection, and automatic resets, enabling efficient long-term real-world training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems.

Submission Number: 4

Loading