Inherently Robust Control through Maximum-Entropy Learning-Based Rollout

TMLR Paper5552 Authors

05 Aug 2025 (modified: 06 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Reinforcement Learning has recently proven extremely successful in the context of robot control. One of the major reasons is massively parallel simulation in conjunction with controlling for the so-called ``sim to real'' gap: training on a distribution of environments, which is assumed to contain the real one, is sufficient for finding neural policies that successfully transfer from computer simulations to real robots. Often, this is accompanied by a layer of system identification during deployment to close the gap further. Still, the efficacy of these approaches hinges on reasonable simulation capabilities with an adequately rich task distribution containing the real environment. This work aims to provide a complementary solution in cases where the aforementioned criteria may prove challenging to satisfy. We combine two approaches, $\textit{maximum-entropy reinforcement learning}$ (MaxEntRL) and $\textit{rollout}$, into an inherently robust control method called $\textbf{Maximum-Entropy Learning-Based Rollout (MELRO)}$. Both promise increased robustness and adaptability on their own. While MaxEntRL has been shown to be an adversarially-robust approach in disguise, rollout greatly improves over parametric models through an implicit Newton step on a model of the environment. We find that our approach works excellently in the vast majority of cases on both the Real World Reinforcement Learning (RWRL) benchmark and on our own environment perturbations of the popular DeepMind Control (DMC) suite, which move beyond simple parametric noise. We also show its success in ``sim to real'' transfer with the Franka Panda robot arm.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Razvan_Pascanu1
Submission Number: 5552
Loading