SERA: Soft Ensemble Reliability Aggregation for Robust Multi-Agent Reinforcement Learning.

TMLR Paper8958 Authors

15 May 2026 (modified: 04 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Bootstrapped temporal-difference learning inherently introduces variance into value estimates, which often destabilizes learning due to value function oscillation between over- and under-estimation. Overestimation is commonly mitigated through pessimistic critic updates, but such bias-based approaches can introduce underestimation and do not address the estimation variance, which is often amplified in multi-agent reinforcement learning (MARL) due to its inherent learning complexities. To address this, we propose SERA, a soft ensemble reliability aggregation framework designed to reduce value estimation variance through reliability-aware critic aggregation. SERA constructs targets through soft reliability-weighted aggregation of critic estimates and introduces a novel decorrelation mechanism that adaptively tunes each critic’s learning rate based on temporal-difference error uncertainty and the variance of target estimation error. This leads to more stable and reliable target estimation during training. Experiments on a wide range of multi-agent continuous-control benchmarks from MuJoCo and PettingZoo show that SERA consistently outperforms strong twin-critic and ensemble baselines, achieving performance improvements of up to 41.1%. We further demonstrate that the same framework generalizes well to single-agent continuous-control tasks, providing gains of up to 31.25% over established methods.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Adam_M_White1
Submission Number: 8958
Loading