Thompson Sampling Algorithm for Stochastic Games

Asaf Cohen; Ruolan He; Yuqiong Wang

Thompson Sampling Algorithm for Stochastic Games

Asaf Cohen, Ruolan He, Yuqiong Wang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic differential games, ergodic control, Thompson sampling, optimal non-linear filtering

Abstract: We study a stochastic differential game with $N$ competitive players in a linear-quadratic framework with ergodic cost, where $d$-dimensional diffusion processes govern the state dynamics with an unknown common drift (matrix). Assuming a Gaussian prior on the drift, we use filtering techniques to update its posterior estimates. Based on these estimates, we propose a Thompson-sampling-based algorithm with dynamic episode lengths to approximate strategies. We show that the Bayesian regret for each player has an error bound of order $O(\sqrt{T\log(T)})$, where $T$ is the time-horizon, independent of the number of players. This implies that average regret per unit time goes to zero. Finally, we prove that the algorithm results in a Nash equilibrium.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 14044

Loading