Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, imperfect information extensive-form games
TL;DR: The first algorithm for provably efficient learning of imperfect information extensive-form games with linear function approximation.
Abstract: Despite significant advances in learning imperfect information extensive-form games (IIEFGs), most existing theoretical guarantees are limited to IIEFGs in the tabular case. To permit efficient learning of large-scale IIEFGs, we take the first step in studying two-player zero-sum IIEFGs with linear function approximation. In particular, we consider linear IIEFGs in the formulation of partially observable Markov games (POMGs) with linearly parameterized rewards. To address the challenge that the underlying function approximation structure is difficult to directly apply due to the imperfect information of states, we construct the composite "feature vectors" for information set-action pairs. Based on this, we further propose a "least-squares loss estimator", which we call the *fictitious* least-squares loss estimator. Through integrating this estimator with the follow-the-regularized-leader (FTRL) framework, we propose the *fictitious* least-squares follow-the-regularized-leader ($\text{F}^2\text{TRL}$) algorithm, which achieves a provable $\widetilde{\mathcal{O}}(\lambda\sqrt{d H^2 T})$ regret guarantee in the large $T$ regime, where $d$ is the ambient dimension of the feature mapping, $H$ is the horizon length, $\lambda$ is a "balance coefficient" and $T$ is the number of episodes. At the core of the analysis of $\text{F}^2\text{TRL}$ is the leverage of our proposed new "balanced transition" over information set-action space. Additionally, we complement our results with an $\Omega(\sqrt{d\min(d,H)T})$ regret lower bound for this problem and conduct empirical evaluations across various environments, which corroborate the effectiveness of our algorithm.
Latex Source Code: zip
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission327/Authors, auai.org/UAI/2025/Conference/Submission327/Reproducibility_Reviewers
Submission Number: 327
Loading