Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Jun 11, 2018 ICML 2018 ECA Submission readers: everyone
  • Keywords: DRL, NLP, Action Elimination
  • TL;DR: Learning control policies from text, when there are thousands of actions to choose from, by learning how not to act
  • Abstract: Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is easier to learn which actions \textbf{not} to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup \textbf{and} added robustness over vanilla DQN in text-based games with over a thousand discrete actions.
0 Replies