Quality and Diversity Optimization Even From Offline Homogeneous Dataset

Ziyu Wang; Kartik Chandra; Ping-Chun Hsieh; Tzu-Mao Li; Yu-Shuen Wang; Yun-Hsuan Lien

Quality and Diversity Optimization Even From Offline Homogeneous Dataset

Ziyu Wang, Kartik Chandra, Ping-Chun Hsieh, Tzu-Mao Li, Yu-Shuen Wang, Yun-Hsuan Lien

16 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Diversity and Performance, Homogeneous Dataset

TL;DR: We introduce an intrinsic reward mechanism that enhances behavioral diversity without sacrificing performance in offline reinforcement learning.

Abstract: We investigate the challenge of promoting diversity in offline reinforcement learning (RL), where agents must develop diverse strategies despite being trained on homogeneous datasets with limited behavioral variation. Existing offline RL approaches, including those leveraging expectation-maximization algorithms for unsupervised clustering, often struggle with either insufficient diversity or performance degradation in such settings. To overcome these limitations, we introduce a novel Unique Behavior objective function that can be \emph{directly computed to quantify the distinctiveness between agents}, eliminating the need for additional estimators and reducing potential estimation errors. By maximizing uniqueness, our approach encourages agents to learn diverse behaviors effectively, even when the training data lacks variety. Extensive experiments on D4RL MuJoCo and Atari benchmarks demonstrate that our method achieves significant behavioral diversity while maintaining strong performance, even from homogeneous training data.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 7724

Loading