MimicAgent: Learning Quadruped Skills via Text-to-Trajectory Generation

Published: 05 Mar 2026, Last Modified: 09 Mar 2026ICLR 2026 Workshop RSI PosterEveryoneRevisionsCC BY 4.0
Keywords: Coding Agents, Quadrupeds, Skill Learning, Example-Guided RL
TL;DR: We propose an agentic pipeline for learning quadruped skills via trajectory synthesis.
Abstract: We present MimicAgent, a text-to-trajectory generation framework for learning dynamic quadruped skills. Although reward shaping is extensively used when training quadruped policies, navigating the resulting reward landscape is notoriously difficult, requiring hours of "graduate student descent". Eureka attempts to automate reward design with LLMs, but we find that it struggles to generalize across diverse skills and morphologies. Motivated by the success of example-guided RL for humanoids, we revisit skill learning from demonstrations for quadrupeds. Unlike humanoids, which can exploit large-scale motion capture datasets for learning, quadrupeds lack such reference motion data. We make the observation that manually keyframing quadruped reference motions can be more intuitive than reward shaping; in particular, we find that rather coarse and even dynamically-infeasible motions can still be effective reference targets for example-guided RL. However, manual keyframing is still too cumbersome to create large-scale skill libraries. To address this challenge, we propose an LLM-based pipeline that generates kinematically feasible quadruped trajectories for diverse skills. Although these trajectories are not dynamically feasible, we show that they are sufficient to train successful policies. Across all evaluated skills, human raters consistently prefer policies generated by MimicAgent over those produced by Eureka.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 105
Loading