Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement
Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala
Nov 04, 2016 (modified: Mar 01, 2017)ICLR 2017 conference submissionreaders: everyone
Abstract:We consider scenarios from the real-time strategy game StarCraft as benchmarks for reinforcement learning algorithms. We focus on micromanagement, that is, the short-term, low-level control of team members during a battle. We propose several scenarios that are challenging for reinforcement learning algorithms because the state- action space is very large, and there is no obvious feature representation for the value functions. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. We also present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm collects traces for learning using deterministic policies, which appears much more efficient than, e.g., ε-greedy exploration. Experiments show that this algorithm allows to successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.
TL;DR:We propose a new reinforcement learning algorithm based on zero order optimization, that we evaluate on StarCraft micromanagement scenarios.
Keywords:Deep learning, Reinforcement Learning, Games
Enter your feedback below and we'll get back to you as soon as possible.