Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong; Xiaoyu Wang; Jingmin Wang; Scott Sanner; Pascal Poupart

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel _doubly Bayesian_ offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.

Lay Summary: Imagine teaching an AI to perform a task, like navigating a building, using only a fixed set of recorded examples. When faced with a new situation it hasn't seen before, the AI can become confused and make poor decisions because its knowledge is incomplete. Many existing approaches make the AI overly cautious to avoid mistakes, but this prevents it from adapting effectively.We introduce a new method called Reflect-then-Plan (RefPlan) that helps an AI reason intelligently about what it doesn't know. Our method works in two steps: * Reflect: As the AI operates, it continuously "reflects" on its recent experiences—the actions it took and what happened as a result—to update its understanding of the specific environment it's currently in. * Plan: When "planning" its next move, it doesn't rely on a single, rigid prediction of the future. Instead, it considers a range of possible scenarios based on its uncertainty, making its strategy more robust to the unexpected. Our results show that this approach significantly improves the AI's performance, making it more flexible and resilient, especially when faced with unfamiliar situations, limited data, or changing conditions.

Link To Code: https://github.com/jihwan-jeong/offline-rl

Primary Area: Reinforcement Learning->Batch/Offline

Keywords: Offline reinforcement learning, Model-based planning, Bayesian inference, Bayesian reinforcement learning

Submission Number: 11881

Loading