Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

ACL ARR 2025 February Submission2735 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod’s Iterated Prisoner’s Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.
Paper Type: Short
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: LLM personality, LLM behaviors, decision-making, multi-agent, cooperation games, steering vectors, representation engineering
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 2735
Loading