Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Baihan Lin; Djallel Bouneffouf; Guillermo Cecchi

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi

Published: 25 Apr 2022, Last Modified: 26 May 2025ICLR 2022 Workshop on Gamification and Multiagent SolutionsReaders: Everyone

Keywords: Bandits, Online learning, Iterated Prisoner's Dilemma, Reinforcement learning

TL;DR: We investigate the behaviors of different reward-driven online learning agents in a multi-agent Iterated Prisoner's Dilemma setting.

Abstract: Prisoner’s Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner’s Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/online-learning-in-iterated-prisoner-s/code)

1 Reply

Loading