Learn out of the box: optimizing both diversity and performance in Offline Reinforcement Learning

Yun-Hsuan Lien; Kartik Chandra; Ping-Chun Hsieh; Tzu-Mao Li; Yu-Shuen Wang

Learn out of the box: optimizing both diversity and performance in Offline Reinforcement Learning

Yun-Hsuan Lien, Kartik Chandra, Ping-Chun Hsieh, Tzu-Mao Li, Yu-Shuen Wang

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Diversity and Performance, Homogeneous Dataset

TL;DR: We introduce an intrinsic reward mechanism that enhances behavioral diversity without sacrificing performance in offline reinforcement learning.

Abstract:

In offline reinforcement learning, most existing methods have focused primarily on optimizing performance, often neglecting the promotion of diverse behaviors. While some approaches generate diverse behaviors from well-constructed, heterogeneous datasets, their effectiveness is significantly reduced when applied to less diverse data. To address this, we introduce a novel intrinsic reward mechanism that encourages behavioral diversity, irrespective of the dataset's heterogeneity. By maximizing the mutual information between actions and policies under each state, our approach enables agents to learn a variety of behaviors, including those not explicitly represented in the data. Although performing out-of-distribution actions can lead to risky outcomes, we mitigate this risk by incorporating the ensemble-diversified actor-critic (EDAC) method to estimate Q-value uncertainty, preventing agents from adopting suboptimal behaviors. Through experiments using the D4RL benchmarks on MuJoCo tasks, we demonstrate that our method achieves behavioral diversity while maintaining performance across environments constructed from both heterogeneous and homogeneous datasets.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6259

Loading