Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization

Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization

ICLR 2026 Conference Submission18430 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, offline reinforcement learning, batch reinforcement learning, deep reinforcement learning, combinatorial action spaces, structured action spaces, discrete action spaces, representation learning

Abstract: Reinforcement learning in combinatorial action spaces requires searching over exponentially many joint actions to simultaneously select multiple sub-actions that form coherent combinations. Existing approaches either simplify policy learning by assuming independence across sub-actions, which often yields incoherent or invalid actions when coordination is required, or attempt to learn action structure and control jointly, which is slow and unstable. We introduce Structured Policy Initialization (SPIN), a two-stage framework that first pre-trains an Action Structure Model (ASM) to capture the manifold of valid actions, then freezes this representation and trains lightweight policy heads for control. On challenging DM Control benchmarks, SPIN improves average return by up to $39\%$ over the state of the art while reducing time to convergence by up to $12.8\times$.

Primary Area: reinforcement learning

Submission Number: 18430

Loading