Weak Bisimulation Metric-based Representations for Sparse-Reward Reinforcement Learning

Dayang Liang; Yunlong Liu

Weak Bisimulation Metric-based Representations for Sparse-Reward Reinforcement Learning

Dayang Liang, Yunlong Liu

28 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep reinforcement learning, Weak bisimulation metric, Representation learning, Sparse reward

TL;DR: To overcome bisimulation metrics' unstable representations in sparse reward settings, we present a weak bisimulation metric-based scalable representation learning approach for deep reinforcement learning, which outperforms SOTA baselines.

Abstract: Recent studies have shown that bisimulation metrics possess the superiority of essentially extracting the features related to reinforcement learning tasks. However, limited by strict assumptions and the inherent conflict between metrics and sparse rewards, they suffer from serious representation degeneration and even collapse in sparse reward settings. To tackle the problems, we propose a reward-free weak bisimulation metric-based scalable representation learning approach (SRL). Specifically, we first introduce the weak bisimulation metric, which bypasses the intractable reward difference, instead leveraging a trainable Gaussian distribution to relax the traditional bisimulation metrics. Particularly, the Gaussian noise creates a flexible information margin for the metric optimization, which mitigates potential representation collapse caused by sparse rewards. Additionally, due to its pure distribution internally, the metric potentially mitigates representation degeneration resulting from inconsistent computations under strict assumptions. To tighten the metric, we accordingly consider continuous differences over the transition distribution to enhance the accuracy of the initial transition distribution difference, strengthening the extraction of equivalent task features. We evaluate SRL on challenging DeepMind Control Suite, MetaWorld, and Adroit tasks with sparse rewards. Empirical results demonstrate that SRL significantly outperforms state-of-the-art baselines on various tasks. The source code will be available later.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13806

Loading