Novelty-Gated Experience Sharing for Multi-Agent Reinforcement Learning

Published: 02 Mar 2026, Last Modified: 04 Apr 2026MALGAIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: decentralized multi-agent reinforcement learning, temporal difference error, non-stationarity, co-adaptation
TL;DR: NGES is a diagnostic that tests how often TD-error sharing selects redundant, familiar transitions by adding a novelty gate and measuring what gets filtered.
Abstract: Decentralized multi-agent reinforcement learning (MARL) can have accelerated learning when agents selectively share informative experiences. To that end, current approaches prioritize high temporal-difference (TD) error as a proxy for informativeness, following the intuition that “surprising” or previously unseen transitions carry the most learning signal. However, we identify a familiarity paradox: in non-stationary multi-agent settings, high TD-error can persist in frequently visited states due to co-adapting agents’ policy changes, conflating epistemic uncertainty with aleatoric noise. To test the practical impact of this phenomenon, we propose Novelty-Gated Experience Sharing (NGES), a dual-gate mechanism that shares transitions only when they are both surprising (high TD-error) and novel (low state visitation count). Hash resolution ablation reveals that up to 30% of high TD-error transitions selected for sharing are redundant, and retroactive analysis confirms that blocked experiences exhibit 1.33× higher TD-error than shared ones, providing direct evidence for the paradox. However, filtering these transitions yields comparable rather than improved performance relative to TDerror-only sharing, and introduces higher seed-to-seed variance, suggesting that hard novelty filtering can occasionally suppress coordination-critical transitions. Consequently, we characterize NGES as a diagnostic probe for when TD-error prioritization over-selects familiar states, and show that the paradox’s practical impact is domain-dependent.
Submission Number: 87
Loading