Strategic Generalization Without Interaction: Can Post-Training Alone Induce Multi-Agent Behavior?

08 Sept 2025 (modified: 28 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Strategic Generalization, Large Language Models, Post-training, Multi-Agent Systems, Economic Reasoning, Game Theory
Abstract: Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements. This paper explores whether post-training techniques can effectively generalize to multi-agent scenarios $\textit{without any interactive multi-agent data}$. We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis. We introduce $\textbf{Recon}$ (Reasoning like an ECONomist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems. Comprehensive evaluations show that Recon substantially improves economic reasoning benchmarks and generalizes to unseen multi-agent games, exhibiting equilibrium-seeking behavior. To our knowledge, this is the first systematic study to demonstrate that domain-aligned post-training can induce emergent strategic behavior in multi-agent settings. These findings underscore post-training as a scalable route to structured reasoning and agent alignment, shedding light on the roles of SFT and RL in cultivating emergent behaviors.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 3169
Loading