Sparse Policy Space Response Oracles

ICLR 2026 Conference Submission13416 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Policy Space Response Oracles, policy sparsity metric, policy space sparsification, exploitability
TL;DR: This paper proposed Sparse-PSRO with sparsity metric and policy space sparsification, which learns better solution and fast
Abstract: In multi-agent non-transitive games, the Policy Space Response Oracles (PSRO) framework approximates Nash Equilibrium by iteratively expanding policy populations. However, the framework suffers from severe policy redundancy in the processes of policy generation and policy population construction, thereby leading to a substantial increase in computational complexity. To address these limitations, this paper proposes Sparse PSRO, a novel framework that overcomes policy redundancy through two key innovations: (1) Sparsity Metric, which quantifies the dissimilarity between candidate strategies and existing populations via convex combination residual constraints, guiding the algorithm to explore underrepresented payoff regions while suppressing redundant policy generation; (2) Policy Space Sparsification, which constructs the Policy Hull backbone via intensive early exploration and admits only geometrically distinct strategies through threshold control, effectively reducing the number of policies and lowering computational complexity. Theoretical analysis proves that Sparse PSRO maintains a finite policy population with guaranteed separation distances, preventing exponential population growth while ensuring convergence to the Nash Equilibrium. Experiments across diverse environments (including RGoS, AlphaStar888, Blotto, and Kuhn Poker) demonstrate that Sparse PSRO significantly outperforms six baseline methods in terms of exploitability and policy population size, thus validating its effectiveness in efficiently approximating Nash Equilibrium with reduced computational costs.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 13416
Loading