Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Tom Zahavy; Matan Haroush; Nadav Merlis; Daniel J. Mankowitz; Shie Mannor

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Published: 27 Jun 2018, Last Modified: 05 May 2023ICML 2018 ECA SubmissionReaders: Everyone

Keywords: DRL, NLP, Action Elimination

TL;DR: Learning control policies from text, when there are thousands of actions to choose from, by learning how not to act

Abstract: Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is easier to learn which actions \textbf{not} to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup \textbf{and} added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

2 Replies

Loading