Online Robust Reinforcement Learning Through Monte-Carlo Planning

Tuan Quang Dam; Kishan Panaganti; Brahim Driss; Adam Wierman

Online Robust Reinforcement Learning Through Monte-Carlo Planning

Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: we provide monte-carlo tree search based planning algorithm for online robust RL problem

Abstract: Monte Carlo Tree Search (MCTS) is a powerful framework for solving complex decision-making problems, yet it often relies on the assumption that the simulator and the real-world dynamics are identical. Although this assumption helps achieve the success of MCTS in games like Chess, Go, and Shogi, the real-world scenarios incur ambiguity due to their modeling mismatches in low-fidelity simulators. In this work, we present a new robust variant of MCTS that mitigates dynamical model ambiguities. Our algorithm addresses transition dynamics and reward distribution ambiguities to bridge the gap between simulation-based planning and real-world deployment. We incorporate a robust power mean backup operator and carefully designed exploration bonuses to ensure finite-sample convergence at every node in the search tree. We show that our algorithm achieves a convergence rate of $\mathcal{O}(n^{-1/2})$ for the value estimation at the root node, comparable to that of standard MCTS. Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics.

Lay Summary: Imagine you're learning to play a video game by practicing on a simulator, but when you finally play the real game, the physics are slightly different—maybe the character jumps a bit lower or moves a bit slower than in the simulator. This gap between practice and reality is a major challenge in artificial intelligence, where computer programs often train in simplified virtual environments before being deployed in the messy real world. This paper tackles this "simulation-to-reality gap" by making AI planning algorithms more robust—meaning they work well even when the real world differs from their training environment. The researchers focus on a popular AI technique called Monte Carlo Tree Search (MCTS), which is like playing out thousands of possible future scenarios in your head before making a decision. Think of MCTS like a chess player who considers many possible moves and counter-moves before choosing their next play. The difference here is that instead of assuming the game rules are perfectly known, the algorithm plans for uncertainty—it considers that the "rules" of the real world might be somewhat different from what it learned in simulation. The key innovation is building uncertainty directly into the decision-making process. Instead of assuming the best-case scenario, the algorithm prepares for reasonable worst-case scenarios. It's like a cautious driver who plans their route assuming there might be unexpected traffic, rather than optimistically assuming clear roads. The algorithm does this by considering multiple possible versions of how the world might behave, making decisions that work well across all these possibilities, and balancing between being too cautious and being too optimistic. This research is important because it helps bridge the gap between AI systems that work perfectly in labs and AI systems that work reliably in the real world. Applications could include autonomous vehicles that can handle unexpected road conditions, medical treatment planning that accounts for patient variability, financial trading systems that remain stable during market volatility, and robotics that can adapt when the real environment differs from simulations. The researchers proved mathematically that their robust algorithm maintains the same learning speed as traditional methods while being much more reliable when faced with unexpected conditions. They tested this in several scenarios, including gambling problems and navigation tasks, showing that the robust approach maintains steady performance even when the real environment differs significantly from what was expected. This work represents a step toward AI systems that are not just smart, but also reliable and trustworthy in real-world deployment. By explicitly planning for uncertainty rather than ignoring it, we can build AI that performs consistently across the messy, unpredictable conditions of the real world.

Link To Code: https://github.com/brahimdriss/RobustMCTS

Primary Area: Reinforcement Learning->Planning

Keywords: Monte-carlo tree search, distributionally robust reinforcement learning, online reinforcement learning

Submission Number: 15297

Loading