Generalizable Opponent Exploitation in LLM Agents via Mixed Best-Responses Training

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agent, Opponent Exploitation, Generalization
Abstract: Opponent exploitation is a crucial capability for agents in competitive scenarios, allowing them to exploit weaknesses in opponent strategies. Large Language Model (LLM) based agents have demonstrated remarkable capabilities in strategic reasoning and adversarial decision-making. However, their ability to exploit diverse opponents, including those following suboptimal strategies, remains underexplored. In this work, we introduce \textbf{GOE-LLM} (Generalizable Opponent Exploitation with LLMs), a novel framework that leverages LLMs to learn opponent exploitation strategies through mixed best-response training in two-player zero-sum games. A Multi-Layer Perceptron (MLP) Profiler is pre-trained independently to analyze opponent behaviors and identify their strategic patterns. This profiling information is then utilized by a fine-tuned LLM Exploiter, trained with group relative policy optimization on a curated set of best-response strategies against heterogeneous opponents. To ensure stable training while enabling the resulting agent to generalize across a broad spectrum of opponents, we propose a Mixture-Best-Responses principle to guide the construction of training data. We evaluate GOE-LLM using various LLM sizes in Kuhn Poker, where it demonstrates strong exploitation against out-of-distribution opponents. Additionally, our method shows consistent performance and generalization trends in Leduc Hold'em Poker. We construct and compare different mixtures of training data to validate the effectiveness of the Mixture-Best-Responses principle, confirming its role in ensuring both stability and generalization. Extensive ablation studies further validate the contributions of each component to the overall performance. Our results highlight the potential of GOE-LLM for generalizable opponent exploitation and demonstrate the effectiveness of mixed best-response training in enhancing the adaptability of LLM agents.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 24378
Loading