Keywords: deep reinforcement learning, stochastic AlphaZero, stochastic environments, search-based planning, MCTS
TL;DR: We propose Robust Stochastic Zero, which improves the planning robustness in stochastic environments using minimal adversarial interventions at the most catastrophic chance events.
Abstract: Planning in stochastic environments is a research topic of high interest that remains underexplored in robustness, despite its importance for real-world applications. To consider robustness, previous methods typically assume an aggressive adversary that constantly attacks the agent, limiting the ability to learn and distorting the stochastic dynamics of chance events. In addition, expectation-centric methods implicitly discount low-probability events, failing to address rare catastrophes. To address this, we introduce Robust Stochastic Zero, the first method to be aware of catastrophes while maintaining the inherent stochastic dynamics. Specifically, it replaces the environment with a lurking adversary that mostly preserves the dynamics but selectively intervenes at the most critical moments. By targeting rarely occurring catastrophic chance events using tree-based planning, our method enables the agent to anticipate and avoid risky decisions, and also develops an adversary capable of delivering malicious impact with minimal intervention. On two benchmark stochastic environments, 2048 and Tetris Block Puzzle, Robust Stochastic Zero achieves an average of 122.1% of the baseline performance over both environments while intervening in only 0.05% of events, and remains comparable when no interventions occur. Our findings demonstrate that the right rather than constant intervention is a direction to robust planning in stochastic environments.
Primary Area: reinforcement learning
Submission Number: 5529
Loading