- Abstract: Training good policies for large combinatorial action spaces is onerous and usually tackled with imitation learning, curriculum learning, or reward shaping. Each of these methods has requirements that can hinder their general application. Here, we study how growing the action space of the policy during training can structure the exploration and lead to convergence without any external data (imitation), with less control over the environment (curriculum), and with minimal reward shaping. We evaluate this approach on a challenging end-to-end full games army control task in StarCraft: Brood War by training policies through self-play from scratch. We grow the spatial resolution and frequency of actions and achieve superior results compared to operating purely at finer resolutions.
- Keywords: Reinforcement Learning, Real-Time Strategy Games, Hierarchical RL, Large Action Space
- Original Pdf: pdf