Novel Policy Seeking with Constrained Optimization

Hao Sun; Zhenghao Peng; Bo Dai; Jian Guo; Dahua Lin; Bolei Zhou

Novel Policy Seeking with Constrained Optimization

Hao Sun, Zhenghao Peng, Bo Dai, Jian Guo, Dahua Lin, Bolei Zhou

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Novel Policy Seeking, Reinforcement Learning, Constrained Optimization

Abstract: We address the problem of seeking novel policies in reinforcement learning tasks. Instead of following the multi-objective framework commonly used in existing methods, we propose to rethink the problem under a novel perspective of constrained optimization. We at first introduce a new metric to evaluate the difference between policies, and then design two practical novel policy seeking methods following the new perspective, namely the Constrained Task Novel Bisector (CTNB), and the Interior Policy Differentiation (IPD), corresponding to the feasible direction method and the interior point method commonly known in the constrained optimization literature. Experimental comparisons on the MuJuCo control suite show our methods can achieve substantial improvements over previous novelty-seeking methods in terms of both the novelty of policies and their performances in the primal task.

One-sentence Summary: We address the problem of seeking novel policies in reinforcement learning tasks with constrained optimization to generate well-performed diverse policies.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2005.10696/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=aO2MWpJaWo

15 Replies

Loading