Keywords: Text-to-motion generation, Humanoid whole body control, Reinforcement Learning fine-tune of LLM
Abstract: This paper focuses on a critical challenge in robotics: translating text-driven generated human motions into executable actions for humanoid robots.
While existing text-to-motion generation methods achieve semantic alignment between language and motion, they often produce kinematically or physically infeasible motions unsuitable for real-world deployment.
To bridge the gap between motion generation and humanoid execution, we propose \textbf{Reinforcement Learning from Physical Feedback (RLPF)}, a novel framework that integrates physics-aware motion evaluation with text-conditioned motion generation.
RLPF employs a motion tracking policy to assess feasibility in a physics simulator, generating rewards for fine-tuning the motion generator.
Furthermore, RLPF introduces an alignment verification module to preserve semantic alignment to text instructions.
This joint optimization ensures both physical feasibility and instruction alignment.
Extensive experiments show that RLPF greatly outperforms baseline methods in generating physically feasible motions while maintaining semantic alignment with text instruction, enabling successful deployment on real humanoid robots.
More visualizations are available at \href{https://anonymous.4open.science/r/RLPF/}{https://anonymous.4open.science/r/RLPF/}.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 6932
Loading