Improving Model-Based Reinforcement Learning by Converging to Flatter Minima

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Reinforcement Learning, Model-Based Reinforcement Learning, Flat Minima, Sharpness Aware Minimization
TL;DR: Encouraging model in model-based reinforcement learning to converge to flatter minima in the loss landscape will result in better downstream policies
Abstract: Model-based reinforcement learning (MBRL) hinges on a learned dynamics model whose errors can compound along imagined rollouts. We study how encouraging \emph{flatness} in the model’s training loss affects downstream control, and show that steering optimization toward flatter minima yields a better policy. Concretely, we integrate \emph{Sharpness-Aware Minimization} (SAM) into world-model training as a drop-in objective, leaving the planner and policy components unchanged. On the theory side, we derive PAC-Bayesian bounds that link first-order sharpness to the value-estimation gap and the performance gap between model-optimal and true-optimal policies, implying that flatter minima tighten both. Empirically, SAM reduces measured sharpness and value-prediction error and improves returns across HumanoidBench, Atari-100k, and high-DoF DeepMind Control tasks. Augmenting existing MBRL algorithms with SAM increases mean return, with especially large gains in settings with high dimensional state–action space. We further observe positive transfer across algorithms and input modalities, including a transformer-based world-model. These results position flat-minima training as a simple, general mechanism for more robust MBRL without architectural changes.
Supplementary Material: zip
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 8454
Loading