BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Batch Reinforcement Learning, Combinatorial Action Spaces, Reinforcement Learning, Discrete Action Spaces, Large Action Spaces, Sequential Decision Making
Abstract: Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose \textbf{Bra}nch \textbf{V}alue \textbf{E}stimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 18100
Loading