BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Xinyue Chen; Zijian Zhou; Zheng Wang; Che Wang; Yanqiu Wu; Qing Deng; Keith Ross

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Qing Deng, Keith Ross

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Deep Reinforcement Learning, Batch Reinforcement Learning, Sample Efficiency

TL;DR: We propose a new Batch Reinforcement Learning algorithm achieving state-of-the-art performance.

Abstract: The field of Deep Reinforcement Learning (DRL) has recently seen a surge in research in batch reinforcement learning, which aims for sample-efficient learning from a given data set without additional interactions with the environment. In the batch DRL setting, commonly employed off-policy DRL algorithms can perform poorly and sometimes even fail to learn altogether. In this paper we propose anew algorithm, Best-Action Imitation Learning (BAIL), which unlike many off-policy DRL algorithms does not involve maximizing Q functions over the action space. Striving for simplicity as well as performance, BAIL first selects from the batch the actions it believes to be high-performing actions for their corresponding states; it then uses those state-action pairs to train a policy network using imitation learning. Although BAIL is simple, we demonstrate that BAIL achieves state of the art performance on the Mujoco benchmark, typically outperforming BatchConstrained deep Q-Learning (BCQ) by a wide margin.

Code: https://anonymous.4open.science/r/e5fbe703-a32d-4679-a2a8-095e74b96e85/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/bail-best-action-imitation-learning-for-batch/code)

Original Pdf: pdf

7 Replies

Loading