Flow Policy Gradients for Legged Robots

Flow Policy Gradients for Legged Robots

ICLR 2026 Conference Submission21209 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, flow matching

TL;DR: We optimize flow matching policies for legged robot control tasks

Abstract: We study robot control with flow policy optimization (FPO), an online reinforcement learning algorithm for flow-based action distributions. We demonstrate how flow policy optimization can succeed for more difficult continuous control tasks than shown in prior work, using a set of design choices that reduce gradient variance and regularize entropy. We show that these design choices mitigate policy collapse challenges faced by the original FPO algorithm and use the resulting algorithm, FPO++, to train flow policies for legged robot locomotion and humanoid motion tracking. We find that FPO++ is stable to train, interpretably models cross-action correlations, and can be deployed to real humanoid robots.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 21209

Loading