Defection at First Sight: Learning Partner Selection in Optional Social Dilemmas without Prior Information

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Social Dilemmas, Partner Selection, Emergence of Cooperation
Abstract: We study populations of self-interested agents playing a 2-person repeated Prisoner's Dilemma game with the option of opting out of the interaction and instead being randomly assigned to a new partner in the population. In contrast to previous work, we remove the assumption that agents know the previous move of every other agent in the game, even when not directly interacting with them. Instead, agents adopt interaction-length dependent policies: in the first round they act without any information about their opponent, while the observed behaviour informs the choice of action at subsequent rounds. Using multi-agent reinforcement learning, we show that in this setup cooperation can emerge and be sustained without any hard-wired partner selection or trigger-restart mechanisms. In the initial interactions, agents learn to defect before adopting reciprocal strategies such as Tit-for-Tat, what is known in the literature as the ``hazing period". Interestingly, agents learn to unconditionally stay in initial interactions, before adopting known cooperation-promoting partner selection rules like Out-for-Tat, leaving defectors and staying with cooperators, in subsequent rounds. Finally, we show how this scales up to agents with longer interaction-length dependent policies.
Area: Modelling and Simluation of Societies (SIM)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 167
Loading