Keywords: Offline Reinforcement Learning, Diversity and Performance, Homogeneous Dataset
TL;DR: We introduce an intrinsic reward mechanism that enhances behavioral diversity without sacrificing performance in offline reinforcement learning.
Abstract: We investigate the challenge of promoting diversity in offline reinforcement learning (RL), where agents must develop diverse strategies despite being trained on homogeneous datasets with limited behavioral variation. Existing offline RL approaches, including those leveraging expectation-maximization algorithms for unsupervised clustering, often struggle with either insufficient diversity or performance degradation in such settings. To overcome these limitations, we introduce a novel Unique Behavior objective function that can be \emph{directly computed to quantify the distinctiveness between agents}, eliminating the need for additional estimators and reducing potential estimation errors. By maximizing uniqueness, our approach encourages agents to learn diverse behaviors effectively, even when the training data lacks variety. Extensive experiments on D4RL MuJoCo and Atari benchmarks demonstrate that our method achieves significant behavioral diversity while maintaining strong performance, even from homogeneous training data.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 7724
Loading