Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering

ACL ARR 2025 February Submission2735 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important. However, they often struggle with cooperation, leading to suboptimal outcomes. Inspired by Axelrod’s Iterated Prisoner’s Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation. Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making. Our results show that higher Agreeableness and Conscientiousness improve cooperation but increase susceptibility to exploitation, highlighting both the potential and limitations of personality-based steering for aligning AI agents.

Paper Type: Short

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: LLM personality, LLM behaviors, decision-making, multi-agent, cooperation games, steering vectors, representation engineering

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 2735

Loading