Keywords: Strategic Generalization, Large Language Models, Post-training, Multi-Agent Systems, Economic Reasoning, Game Theory
Abstract: Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements.
This paper explores whether post-training techniques can effectively generalize to multi-agent scenarios $\textit{without any interactive multi-agent data}$.
We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis.
We introduce $\textbf{Recon}$ (Reasoning like an ECONomist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems.
Comprehensive evaluations show that Recon substantially improves economic reasoning benchmarks and generalizes to unseen multi-agent games, exhibiting equilibrium-seeking behavior.
To our knowledge, this is the first systematic study to demonstrate that domain-aligned post-training can induce emergent strategic behavior in multi-agent settings.
These findings underscore post-training as a scalable route to structured reasoning and agent alignment, shedding light on the roles of SFT and RL in cultivating emergent behaviors.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 3169
Loading