EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control

Samuel Holt; Todor Davchev; Dhruva Tirumala; Ben Moran; Atil Iscen; Antoine Laurens; Yixin Lin; Erik Frey; Markus Wulfmeier; Francesco Romano; Nicolas Heess

EvoControl: Multi-Frequency Bi-Level Control for High-Frequency Continuous Control

Samuel Holt, Todor Davchev, Dhruva Tirumala, Ben Moran, Atil Iscen, Antoine Laurens, Yixin Lin, Erik Frey, Markus Wulfmeier, Francesco Romano, Nicolas Heess

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: EvoControl marries a slow PPO high-level policy with an evolution-strategies low-level controller to learn fast, adaptive, tune-free high-frequency continuous control that outperforms both PD-based hierarchies and direct torque baselines.

Abstract: High-frequency control in continuous action and state spaces is essential for practical applications in the physical world. Directly applying end-to-end reinforcement learning to high-frequency control tasks struggles with assigning credit to actions across long temporal horizons, compounded by the difficulty of efficient exploration. The alternative, learning low-frequency policies that guide higher-frequency controllers (e.g., proportional-derivative (PD) controllers), can result in a limited total expressiveness of the combined control system, hindering overall performance. We introduce *EvoControl*, a novel bi-level policy learning framework for learning both a slow high-level policy (using PPO) and a fast low-level policy (using Evolution Strategies) for solving continuous control tasks. Learning with Evolution Strategies for the lower-policy allows robust learning for long horizons that crucially arise when operating at higher frequencies. This enables *EvoControl* to learn to control interactions at a high frequency, benefitting from more efficient exploration and credit assignment than direct high-frequency torque control without the need to hand-tune PD parameters. We empirically demonstrate that *EvoControl* can achieve a higher evaluation reward for continuous-control tasks compared to existing approaches, specifically excelling in tasks where high-frequency control is needed, such as those requiring safety-critical fast reactions.

Lay Summary: Robots often need to react in a split second when they bump into something or when conditions suddenly change. Today’s learning-based controllers struggle to make such lightning-fast decisions, while the hand-tuned “reflex” loops engineers add underneath can be hard to adjust and are not very flexible. We introduce EvoControl, a two-layer control system that behaves a bit like a human driver with quick reflexes and slower strategic thinking. The top layer plans only a few times per second, deciding roughly what the robot should do next. Beneath it, a second layer acts hundreds of times per second, handling the fine-grained muscle work. At first, this lower layer is a familiar, safe controller, but—as training progresses—our algorithm gradually “evolves” it into a learned neural reflex that can outperform the original hand-tuned version. Across a dozen simulated tasks and real-robot tests, EvoControl let machines move more smoothly, adapt faster to surprises, and required far less manual tweaking. This approach could make future assistive robots, factory arms, and autonomous vehicles safer and more reliable.

Primary Area: Reinforcement Learning->Everything Else

Keywords: Hierarchical Reinforcement Learning, Evolutionary Strategies, High-Frequency Control, Continuous Control, Robotics

Submission Number: 6335

Loading