Deep Q-Learning with Low Switching Cost

Shusheng Xu; Simon Shaolei Du; Yi Wu

Deep Q-Learning with Low Switching Cost

Shusheng Xu, Simon Shaolei Du, Yi Wu

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: deep Q-network, DQN, switching cost, deep Q-learning

Abstract: We initiate the study on deep reinforcement learning problems that require low switching cost, i.e., small number of policy switches during training. Such a requirement is ubiquitous in many applications, such as medical domains, recommendation systems, education, robotics, dialogue agents, etc, where the deployed policy that actually interacts with the environment cannot change frequently. Our paper investigates different policy switching criteria based on deep Q-networks and further proposes an adaptive approach based on the feature distance between the deployed Q-network and the underlying learning Q-network. Through extensive experiments on a medical treatment environment and a collection of the Atari games, we find our feature-switching criterion substantially decreases the switching cost while maintains a similar sample efficiency to the case without the low-switching-cost constraint. We also complement this empirical finding with a theoretical justification from a representation learning perspective.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: A systematic study on deep Q-learning that requires low switching cost.

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=kb2EBV01zl

6 Replies

Loading