Keywords: Sequential social dilemma, Multi agent reinforcement learning, Decision Transformer
TL;DR: Existing RL struggles with complex social dilemmas due to long action sequences and dynamic relationships. We created a simplified "Risk"-like game & A2C Decision Transformer to study this, supporting our hypothesis that temporal context is crucial.
Abstract: Real-world decision-making involves complex sequential social dilemmas (SSDs) where current reinforcement learning (RL) algorithms struggle due to high non-stationarity caused by dynamic interactions and conflicting goals, particularly in games like Risk, Civilization, and Diplomacy. To address this, we introduce a simplified "Risk"-inspired environment to study explainable AI in complex SSDs, retaining key features like stochastic outcomes and temporary alliances. Experiments show traditional RL methods (DDPG, A2C, PPO) underperform against basic bots in this environment, suggesting limitations in capturing opponent intentions from isolated states. We also explored A2C Decision Transformer (A2C-DT) that leverages temporal context, showing performance gains over traditional methods.
Submission Number: 28
Loading