DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Kaixuan Xu; Jiajun Chai; Sicheng Li; Yuqian Fu; Yuanheng Zhu; Dongbin Zhao

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Diplomacy is a complex multiplayer game that re- quires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Lan- guage Models (LLMs) offer a promising alterna- tive, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplo- macy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilib- rium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to sim- plify the complex task of multi-unit action assign- ment into a sequence of unit-level decisions. By defining an equilibrium policy within this frame- work as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its per- formance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.

Lay Summary: *Diplomacy* is a complex board game where seven players must negotiate, form alliances, and compete to control Europe — making it especially challenging for AI to master. Traditional AI systems learned by simulating millions of matches, demanding huge amounts of data and computing power. We present DipLLM, a new AI agent built on large language models — the kind behind tools like ChatGPT. Unlike previous systems, DipLLM learns from only a small number of games. It doesn’t try to evaluate every possibility; instead, it breaks down complex decisions into simpler steps, deciding what each unit should do through step-by-step reasoning. Despite using just 1.5% of the data that top AI systems relied on, DipLLM performs even better. This shows that large language models, when fine-tuned thoughtfully, can handle complex multiplayer strategy games efficiently — opening the door to more accessible and general-purpose game-playing agents.

Link To Code: https://github.com/KaiXIIM/dipllm

Primary Area: Reinforcement Learning->Multi-agent

Keywords: LLM-based agent, Diplomacy, fine-tuning

Submission Number: 2754

Loading