Keywords: decentralized multi-agent reinforcement learning, temporal difference error, non-stationarity, co-adaptation
TL;DR: NGES is a diagnostic that tests how often TD-error sharing selects redundant, familiar transitions by adding a novelty gate and measuring what gets filtered.
Abstract: Decentralized multi-agent reinforcement learning (MARL) can have accelerated
learning when agents selectively share informative experiences. To that end, current approaches prioritize high temporal-difference (TD) error as a proxy for informativeness, following the intuition that “surprising” or previously unseen transitions carry the most learning signal. However, we identify a familiarity paradox:
in non-stationary multi-agent settings, high TD-error can persist in frequently visited states due to co-adapting agents’ policy changes, conflating epistemic uncertainty with aleatoric noise. To test the practical impact of this phenomenon,
we propose Novelty-Gated Experience Sharing (NGES), a dual-gate mechanism
that shares transitions only when they are both surprising (high TD-error) and
novel (low state visitation count). Hash resolution ablation reveals that up to 30%
of high TD-error transitions selected for sharing are redundant, and retroactive
analysis confirms that blocked experiences exhibit 1.33× higher TD-error than
shared ones, providing direct evidence for the paradox. However, filtering these
transitions yields comparable rather than improved performance relative to TDerror-only sharing, and introduces higher seed-to-seed variance, suggesting that
hard novelty filtering can occasionally suppress coordination-critical transitions.
Consequently, we characterize NGES as a diagnostic probe for when TD-error
prioritization over-selects familiar states, and show that the paradox’s practical
impact is domain-dependent.
Submission Number: 87
Loading