Abstract: We conduct a study in which large language models (LLMs) engage in open-ended dialogue and attempt to infer each other's identity without supervision or rewards. This setting gives rise to emergent behaviors: some models drop identity hints, while others pretend to be human. GPT and Claude are frequently identified, likely due to distinctive traits or wide training exposure, while others like DeepSeek and Qwen remain nearly invisible. We analyze the linguistic and behavioral signatures that distinguish each model, and use free-text justifications to study the meta-strategies LLMs employ to make identity guesses. Finally, we show that identity recognition influences downstream decision-making: in post-dialogue economic games, models vary their cooperative behavior based on whom they implicitly believe they are speaking with. These findings suggest that identity reasoning emerges spontaneously in open-ended model-to-model interaction, shaping both discourse and behavior in multi-agent settings.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM/AI agents, applications, safety and alignment
Contribution Types: Model analysis & interpretability
Languages Studied: English, Chinese
Submission Number: 3506
Loading