Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Zhenyu Jiang; Yuqi Xie; Jinhan Li; Ye Yuan; Yifeng Zhu; Yuke Zhu

Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Zhenyu Jiang, Yuqi Xie, Jinhan Li, Ye Yuan, Yifeng Zhu, Yuke Zhu

Published: 05 Sept 2024, Last Modified: 08 Nov 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Humanoid Robot, Whole-Body Motion Generation

TL;DR: Humanoid motion generation with human motion prior and VLM-based motion refinement.

Abstract: Humanoid robots, with their human-like embodiment, have the potential to integrate seamlessly into human environments. Critical to their coexistence and cooperation with humans is the ability to understand natural language communications and exhibit human-like behaviors. This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions. We leverage human motion priors from extensive human motion datasets to initialize humanoid motions and employ the commonsense reasoning capabilities of Vision Language Models (VLMs) to edit and refine these motions. Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions, validated through both simulated and real-world experiments. More videos can be found on our website https://ut-austin-rpl.github.io/Harmon/.

Supplementary Material: zip

Website: https://ut-austin-rpl.github.io/Harmon/

Publication Agreement: pdf

Student Paper: yes

Spotlight Video: mp4

Submission Number: 497

Loading