Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story

Vincenzo de paola; Riccardo Zamboni; Mirco Mutti; Marcello Restelli

Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story

Vincenzo de paola, Riccardo Zamboni, Mirco Mutti, Marcello Restelli

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Parallel data collection has redefined Reinforcement Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, $N$ identical agents operate in $N$ replicas of an environment simulator, accelerating data collection by a factor of $N$. A critical question arises: *Does specializing the policies of the parallel agents hold the key to surpass the $N$ factor acceleration?* In this paper, we introduce a novel learning framework that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing redundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against systems of identical agents, as well as synergy with batch RL techniques that can exploit data diversity. Finally, we provide an original concentration analysis that shows faster rates for specialized parallel sampling distributions, which supports our methodology and may be of independent interest.

Lay Summary: Modern reinforcement learning systems often speed up learning by running many identical agents in parallel environments, collecting data much faster than a single agent could. But this raises an important question: could we go even further by letting these agents specialize, instead of making them all act the same? In our work, we propose a new approach where each agent explores the environment in its own unique way. We designed a method that encourages agents to behave differently, increasing the variety of the data they collect. This reduces redundancy and makes the overall dataset more informative. We also use a centralized learning technique to manage this coordination efficiently. Our results show that this method improves learning speed and quality compared to traditional identical-agent systems. It also works well with other RL strategies that benefit from diverse data. We back our findings with theoretical analysis, suggesting that smarter specialization in parallel RL could significantly advance real-world AI applications.

Link To Code: https://github.com/enzodepaola/Enhancing-Diversity-in-Parallel-Agents-A-Maximum-State-Entropy-Exploration-Story.git

Primary Area: Reinforcement Learning

Keywords: Maximum State Entropy, Parallel Reinforcement Learning, Agents' Diversity

Submission Number: 6901

Loading