% \vspace{-0.5cm}
\section{Conclusions}\label{sec:conclusion}
% \vspace{-0.2cm}
In this work, we make the first step towards provably efficient learning of two-player zero-sum IIEFGs with linear function approximation, in the formulation of POMGs with linearly realizable rewards. 
It is proven that the proposed \LSFTRL algorithm attains a regret guarantee of order $\widetilde{\gO}(\lambda H\sqrt{ d T})$ in large $T$ regime.
We accomplish this by devising a \textit{fictitious} least-squares loss estimator for this problem, along with the design of a kind of new ``balanced transition'' over infoset-action space, which might be of independent interest.
Moreover, we establish an $\Omega(\sqrt{d\min(d,H)T})$ regret lower bound for this problem and conduct empirical evaluations on various environments, which validate the advantages of our \LSFTRL algorithm.
Besides, there are also several interesting future directions to be explored. One natural question may be how to obtain high-probability results for this challenging problem so as to find an approximate NE with high-probability. 
We believe it is possible to extend our results to high-probability ones using self-concordant barrier potential functions and increasing learning rates \citep{LeeLWZ20}.
The other question might be whether it is possible to generalize the proposed algorithm to multi-player general-sum IIEFGs.
We believe the results of this work will shed light on better understandings of learning large-scale IIEFGs and we leave these extensions as our further studies.

\section*{Acknowledgements}
The corresponding author Shuai Li is partly supported by the Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001).


% \section*{Limitations}
% This work studies learning zero-sum IIEFGs with linearly parameterized rewards. The linear function approximation setting over rewards might not be satisfied in complex and high-dimensional scenarios in practice, yet we believe this is indeed a necessary first step towards a more complex function approximation setting. We believe it is possible and interesting to extend the proposed algorithm as well as the results in this work to the nonlinear function approximation settings (\textit{e.g.}, low Eluder dimension \citep{RussoR13}), which we leave as our future study.

% \section*{Broader Impact}
% This paper aims to make the first step towards provably efficient learning of IIEFGs with linear function approximation. Most of the results in this paper are purely theoretical and we are currently not aware of any potential negative societal consequences of this paper.

% --------------------------------------------------------------------------------------
% This paper presents the first line of algorithms learning  two-player zero-sum IIEFGs with linear function approximation. We design a novel unbiased estimator and provide two algorithms utilizing it. \LSOMD manages to attain $\widetilde{\gO}(\sqrt{HX^2d\alpha^{-1}T})$, removing dependency on $A$, yet holds under certain assumption. \LSFTRL can achieves $\widetilde{\gO}(\sqrt{H^2d\lambda^{-1}T})$, removing reliance on both $X$ and $A$ factors and leaving a game-tree-structure-related constant $\lambda$. It also enjoys a worst case performance guarantee of $\widetilde{\gO}(\sqrt{HXdT})$. Moreover, our work lays the groundwork for more advanced level-based estimators beyond our least squares estimator, opening research avenues like designing estimators better leveraging game tree structure. Currently our sample complexity results hold in expectation but not with high probability. However, we believe our work paves the way for solving open problems in large-scale IIEFGs. 
% --------------------------------------------------------------------------------------