Instance-dependent Sample Complexity for Bilinear Saddle-Point Optimization with Noisy Feedback: An LP-Based Approach

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: learning in game, instance-dependent guarantees, linear programming, zero sum game
Abstract: In this work, we study the sample complexity of obtaining a Nash equilibrium (NE) estimate in two-player zero-sum matrix games with noisy feedback. Specifically, we propose a novel algorithm that repeatedly solves linear programs (LPs) to obtain an NE estimate with bias at most $\epsilon$ with a sample complexity of $O(\frac{m_1 m_2}{\epsilon\min\\{\delta^2,\sigma_0^2,\sigma^3\\}} \log\frac{m_1 m_2}{\epsilon})$ for general $m_1 \times m_2$ game matrices, where $\sigma$, $\sigma_0$, $\delta$ are some problem-dependent constants. To our knowledge, this is the first instance-dependent sample complexity bound for finding an NE estimate with $\epsilon$ bias in general-dimension matrix games with noisy feedback and potentially non-unique equilibria. Our algorithm builds on recent advances in online resource allocation and operates in two stages: (1) identifying the support set of an NE, and (2) computing the unique NE restricted to this support. Both stages rely on a careful analysis of LP solutions derived from noisy samples.
Submission Number: 50
Loading