Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Dyna-style synthetic data helps in OpenAI Gym but hurts in DeepMind Control Suite, despite both using MuJoCo and identical hyperparameters, highlighting key challenges in model-based RL across benchmarks that can't be alleviated via modern methods.
Abstract: Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in control tasks in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process --- the backbone of Dyna-style algorithms --- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.
Lay Summary: Many AI systems learn by trial and error, using simulations to practice before making real-world decisions. A popular technique to speed up this learning process is to let algorithms imagine "what-if" scenarios using a model of the world. This idea, called model-based reinforcement learning, is supposed to make learning more efficient by generating synthetic training data. However, our research found that this approach doesn’t always work as expected. We compared its performance on two popular testing platforms for robotic control tasks, OpenAI Gym and DeepMind Control Suite, which have similar physics and task types. Surprisingly, model-based methods performed well in Gym but often failed in the DeepMind environments. We investigated why this gap exists and found that adding these "what-if" experiences - the core idea of this technique - can sometimes hurt performance. Our findings challenge the assumption that model-based learning is always a method to improve efficiency and highlight the need for more robust techniques that work consistently across different environments.
Link To Code: https://github.com/CLeARoboticsLab/STFL
Primary Area: Reinforcement Learning->Deep RL
Keywords: model-based reinforcement learning, online reinforcement learning, deep reinforcement learning
Submission Number: 2256
Loading