Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

Hengrui Cai; Chengchun Shi; Rui Song; Wenbin Lu

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Continuous action space, Deep learning, Multi-scale change point detection, Off-policy evaluation

Abstract: We consider off-policy evaluation (OPE) in continuous action domains, such as dynamic pricing and personalized dose finding. In OPE, one aims to learn the value under a new policy using historical data generated by a different behavior policy. Most existing works on OPE focus on discrete action domains. To handle continuous action space, we develop a brand-new deep jump Q-evaluation method for OPE. The key ingredient of our method lies in adaptively discretizing the action space using deep jump Q-learning. This allows us to apply existing OPE methods in discrete domains to handle continuous actions. Our method is further justified by theoretical results, synthetic and real datasets.

One-sentence Summary: Develop deep jump Q-evaluation for off-policy evaluation in continuous action domains, by adaptively discretizing the action space.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/deep-jump-q-evaluation-for-offline-policy/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=UB0K5jqnbG

5 Replies

Loading