Evolutionary Multi-Agent Reinforcement Learning for Crisis-Aware Demographic Policy Optimization

Evolutionary Multi-Agent Reinforcement Learning for Crisis-Aware Demographic Policy Optimization

ICLR 2026 Conference Submission13838 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Evolutionary Reinforcement Learning, Adaptive Evolutionary Booster, MADDPG, Crisis-Aware Demographic Modeling, Policy Optimization, Population Stability Metric, Real-World Regional Data, Meta-Learning, PBT Comparison

TL;DR: An adaptive evolutionary booster for MADDPG that triggers when learning stalls, improving reward and population-stability metrics in a crisis-aware demographic environment built from real regional data.

Abstract: Demographic systems face unprecedented challenges from simultaneous crises. Conventional statistical demography techniques and agent-based models often struggle to capture nonlinear inter-regional interactions during periods of severe socio-economic disruption. To address this, we propose MADDPG-EVO-DGM, a hybrid algorithm that integrates multi-agent deep reinforcement learning with evolutionary optimization and meta-learning principles to model regional demographic processes under multiple crisis scenarios. Each region is treated as an autonomous agent learning to steer demographic policy levers, while periodic evolutionary “boosters” overcome local optima via population-based perturbations of actor network parameters. Additionally, a Darwin–Gödel Machine-inspired meta-learning mechanism adapts the booster triggers, enabling self-improvement in the learning process. We evaluate MADDPG-EVO-DGM on a simulation environment calibrated with real demographic data for eight federal regions of the Russian Federation over the period 2000–2025 and subject to ten concurrent crisis scenarios (e.g., pandemic, geopolitical conflict, economic collapse). Experiments demonstrate significantly faster convergence and improved performance over a baseline MADDPG: the hybrid approach achieves a higher final average reward (252.57 vs 243.07) and $3.4\times$ lower convergence variance ($\sigma=0.24$ vs $0.80$), indicating more reliable training. It also exhibits qualitative performance jumps of $+68\%$ during evolutionary phases and maintains 35–45\% greater resilience under crisis shocks compared to the baseline. To our knowledge, this is the first application of multi-agent reinforcement learning to large-scale demographic modeling under crises, opening new possibilities for evidence-based, crisis-resilient population policy design. Code, data and logs are provided to ensure reproducibility.

Primary Area: reinforcement learning

Supplementary Material: zip

Submission Number: 13838

Loading