

\section{things to do}








    




\section{Introduction}\label{sec:introduction}
\input{introicml.tex}
\begin{comment}
 Given  a quantum measurement  $0\preceq R\preceq I$ the Best Separable State (BSS) problem asks for a product state $\rho \otimes \sigma$ that maximizes the probability of passing the test $R$,  e.g. see  \cite{barak2017quantum} and references therein. In terms of optimization this corresponds to  maximizing a bilinear function  over the product of density matrices, i.e., 
 \begin{equation}\label{BSS}\tag{BSS}
        \max \{ \Tr(R(\rho\otimes \sigma)):  \rho \in \da, \sigma \in \db\},
\end{equation} The set of \emph{separable quantum states} on the joint system $\da\otimes \db $ is by definition the convex hull of the set of product states $\rho \otimes \sigma$. 
Consequently, the \ref{BSS} problem 
 corresponds to linear optimization over the convex compact set of separable quantum states, whose 
  complexity is  closely related to the problem of deciding whether a bipartite quantum state is separable \cite{grotschel2012geometric, ioannou2006computational}. The problem of testing whether a state is separable  is 
     computationally intractable \cite{gurvits2003classical, gharibian2008strong} and has been studied extensively, e.g. see  
\cite{ioannou2006computational, bruss2002characterizing, terhal2002detecting, doherty2004complete, barak2017quantum}.

An extremely useful   approach to  approximate  the value of the BSS  problem is to identify outer approximations to the set of separable states over which linear optimization is efficient.  These outer approximations typically arise from necessary conditions that are satisfied by separable states. An important  example is the {\em Positive Partial Transpose}~(PPT)  criterion for separability~\cite{peres1996separability, horodecki1996necessary}, stating  that a  necessary condition for $\rho$ to be separable is that the partial transpose $\rho^{T_\mathcal{B}}$ (or $\rho^{T_\mathcal{A}}$)  is positive semidefinite. Consequently, the value of the BSS problem is upper bounded by the value of  a semidefinite programming (SDP) problem.  This approach was later generalized to the DPS hierarchy \cite{doherty2004complete} of outer semidefinite programming approximations  that converges to the set of separable states. The   first level of the DPS hierarchy corresponds to the PPT criterion. 



 A more general problem with  many important  applications in quantum information theory is the  problem of bilinear optimization over  density matrices subject to  affine constraints. Similar to the DPS hierarchy there exist hierarchies of outer SDP  approximations for  problems of that type, e.g., see \cite{berta,berta2022semidefinite}. A complementary approach for solving such problems  is a non-commutative extension   of the  branch-and-bound algorithm \cite{marco}.


Finally, as the pure product states $xx^\dagger \otimes y y^\dagger$ are the extreme points of the set of separable quantum states, so the~\ref{BSS} problem always has a pure product state as a solution. The \ref{BSS} problem is thus the mixed extension of the problem of biquadratic optimization over the product of spheres 
$$\max\{(x\otimes y)^\dagger R(x\otimes y):\|x\|_2=1, {\|y\|_2=1}\},$$ and attains the same value.
This problem is also extensively-studied, and has  applications ranging  from the strong ellipticity condition problem \cite{simpson1982copositive,han2009conditions} to tensor approximation \cite{biquadratic2}. 


In this work we introduce an algorithm  for the   \ref{BSS} problem when  the test $R$ is unknown and $\rho, \sigma$ are updated in a decentralized manner  using    first-order feedback $\nabla_\rho \Tr(R(\rho\otimes \sigma))$ and $\nabla_\sigma \Tr(R(\rho\otimes \sigma))$ respectively. To achieve  this, we reinterpret the BSS problem as a quantum common-interest game (QCIG) where  the two players have density matrices $\rho, \sigma$ as strategies and share a common bilinear utility function $\Tr(R(\rho\otimes \sigma))$.  Following   the modern paradigm of learning in games we  search for   algorithms that achieve  global optimality in a distributed fashion, i.e.,  where   each  player  individually updates his own  state  taking into account  past  interactions.


  

 Learning in games has emerged as a powerful tool for Machine Learning with numerous applications. Although some of the most well known success stories, such as Generative Adversarial Networks~\cite{goodfellow2014generative}, solving Go~\cite{silver2016mastering}, and Poker~\cite{moravvcik2017deepstack} are based on zero-sum games, recently and increasingly the focus is shifting towards the much harder domain of cooperative settings~\cite{dafoe2021cooperative,dafoe2020open}.
Games of identical/common interest are an important step in this direction. The cooperative card game Hanabi~\cite{bard2020hanabi} and robot soccer~\cite{kitano1997robocup} are such examples, where all agents share the same goals and try to optimize the same function. What makes such problems so hard is that i) the agents are trying to optimize a highly non-convex optimization problem, ii)  the agents have constraints on the allowable set of mixed strategies, and iii) the  learning dynamics of each agent should only depend on their own payoff vectors (gradient, first-order information) without knowledge about the other agents (known as decentralized~dynamics).



 
 \paragraph{BSS as a quantum common-interest game.} 

In a  quantum CIG game, there are two agents Alice and Bob that
 control  quantum registers $\mathcal{A}$ and $\mathcal{B}$ and their strategies are given by density matrices in    $D(\mathcal{A})$ and $ D(\mathcal{B})$ respectively. Upon playing strategy profile $(\rho,\sigma)\in \ D(\mathcal{A})\times  D(\mathcal{B})$  both players receive a common utility $u(\rho, \sigma) = \langle R, \rho\otimes\sigma\rangle$,  where $R$ is a Hermitian positive semidefinite matrix.
Quantum CIGs  are a non-commutative generalization of  classical normal-form CIGs. In the typical two-agent version that we focus on, the agents select simplex vectors   $ x\in \Delta_\mathcal{A}, y \in \Delta_\mathcal{B} $ 
and receive a common bilinear utility $ x^\top A y$. From the perspective of first-order dynamics,
CIGs are equivalent to potential games~\cite{sandholm2010population},
a class of games that  has been widely studied due to its connections to Cournot competition \cite{monderer1996potential}, congestion games \cite{rosenthal1973class}, and various other theoretical and engineering applications, see e.g.   \cite{marden2009cooperative, zeng2018potential, he2019game, della2016potential}.

 
 As both agents in a classical  CIG are maximizing  the same utility function, there is a natural connection between the game and the bilinear optimization problem 
$\max  \{ x^\top A y: x\in \Delta_\mathcal{A}, y \in \Delta_\mathcal{B}\}$, and specifically the optimization problem's KKT  points correspond to Nash equilibria of the CIG \cite{sandholm2010population}. Although it is non-convex, this problem is easy to solve if the common payoff matrix is known. On the other hand, in the setting where $A$ is unknown and  only first-order information is accessible from the utility function, 
several `natural' learning dynamics including  the replicator dynamics and  smooth fictitious play have been shown to converge to equilibrium sets when applied to potential games, see e.g. \cite{hofbauer1998evolutionary, hofbauer2002global, kleinberg2009multiplicative}.

Of particular relevance to this work are the continuous-time replicator dynamics
\begin{equation}\label{replicator}\tag{\rm{REP}}
\dot{x}_i=x_i((Ay)_i-x^\top Ay), 
\end{equation
written only in terms  of the $x$-player, or equivalently:
\begin{equation}\label{rep-exp}
\dot{x}_i={\exp(s_i(t))\over \sum_i\exp(s_i(t))} \ \text{ where }\  s_i(t)=\int_0^t(Ay(t))_idt,
\end{equation}
e.g. see \cite{taylor1978evolutionary,hofbauer1998evolutionary,bomze1983lotka,weibull1997evolutionary}. The replicator  dynamics can be also seen as a gradient flow with respect to the Shahshahani metric 
$\langle v,w\rangle_x=\sum_i{\frac{1}{x_i}}v_iw_i$~\cite{shahshahani1979new}.  There are two main discrete-time versions  for the replicator~dynamics:  
  \begin{align}
    x_i&\leftarrow x_i{(Ay)_i \over x^\top Ay} \label{linear}\tag{\rm{MWU$_\ell$}}\\
    x_i&\leftarrow x_i { \exp((Ay)_i)\over \sum_ix_i\exp((Ay)_i)} \label{exp} \tag{\rm{MWU$_e$}},
  \end{align}
 
 where we refer to the first one as the   {\em linear multiplicative weights update} and the second one as  the {\em exponential multiplicative weights update}, see e.g., see ~\cite{hofbauer2003evolutionary,arora2012multiplicative,palaiopanos2017multiplicative, freund1997decision}. \ref{linear} is a discrete version of \ref{replicator} in the sense that, in both cases, the weight of a strategy increases if and only if it performs better than average; \ref{exp} is simply \eqref{rep-exp} written recursively for the case where the payoffs are seen in discrete time intervals.
 
 
 In continuous time, the replicator dynamics are known to converge to Nash equilibria in CIGs \cite{hofbauer1998evolutionary,kleinberg2009multiplicative}. 

 Moreover, careful stability analysis has shown that both replicator dynamics as well as its discretization typically converge to pure (i.e. non-randomized) equilibria, which correspond to second-order KKT points instead of first-order KKT/mixed Nash in generic CIGs~\cite{kleinberg2009multiplicative, mertikopoulos2016learning,panageas2019multiplicative}.


\input{arxiv/our_results}






\paragraph{Prior work on quantum games and matrix multiplicative weights updates.} The game theory model where  two players select density matrices and get possibly different bilinear utilities has been studied before in the literature \cite{jain2009parallel, ickstadt2022semidefinite}. Nevertheless, the main focus in those works  is in the setting of competing players (i.e., zero-sum). Specifically, Jain and Watrous  \cite{jain2009parallel} introduce a parallel algorithm for computing  Nash Equilibria  whereas  Ickstadt et. al. \cite{ickstadt2022semidefinite} focus on structural results. 

Underlying the results in \cite{jain2009parallel}  is the celebrated matrix multiplicative weights update (MMWU) introduced by Arora and Kale \cite{arora2005fast, tsuda2005matrix, kale2007efficient, arora2012multiplicative}, which is a generalization of \ref{exp} to density matrices. 

MMWU has found many applications: important examples include solving SDPs \cite{arora2007combinatorial}, proving the QIP=PSPACE result in quantum information theory  \cite{jain2011qip}, finding balanced separators \cite{orecchia2012approximating}, and spectral sparsification~\cite{allen2015spectral}. MMWU was originally devised for performing online optimization over density matrices and solving SDPs, and was used in the aforementioned applications either as a theoretical tool or as a classical algorithm. The quantum implementability of MMWU was studied later in \cite{van2020quantum, brandao2017quantum, brandao2017quantum2}, which devised quantum SDP solvers by ``quantizing'' the classical MMWU-based SDP solvers. 

\paragraph{Comparing replicator formulations from prior work.}
Recently, Jain et. al. \cite{jain2022matrix} also studied the dynamical properties of MMWU in zero-sum games via a matrix formulation of replicator dynamics, which we note to be distinct from the \ref{eqn:_QREP} dynamics in our work. 

[ADD ADDITIONAL comparison here: classical also has 2 types of replicator, there are differences in performances, see experiments in appendix]

\end{comment}



















\section{Quantum Common-Interest Games and the BSS problem}\label{sec:QCIG}



\paragraph{Quantum preliminaries.}
A $d$-dimensional quantum register is mathematically described as the set of unit vectors  in a   $d$-dimensional Hilbert space $\mathcal{H}.$
The \emph{state} of a qudit quantum  register $ \mathcal{H}$ is represented by a \emph{density matrix}, i.e.,  a $d\times d$ Hermitian positive semidefinite matrix with trace equal to one. The state space of a quantum register $\mathcal{H}$ is denoted  by  $D(\mathcal{H})$.
When two quantum registers with associated spaces $\mathcal{A}$ and $\mathcal{B} $ of dimension $n$ and $m$ respectively are considered as a joint quantum register, the associated state  space is given by the density operators  on the tensor product space, i.e., $D(\mathcal{A}\otimes \mathcal{B})$.  If the two registers are independently prepared in states described by $\rho$ and $\sigma$, then the joint state is described by the  density matrix $\rho \otimes \sigma\in \mathbb{C}^{nm\times nm}$.


To interact with a quantum register  we need to measure it. One mathematical formalism of the process of measuring a  quantum system is the POVM,  defined as a set of positive semidefinite operators $\{P_i\}_{i=1}^m$ such that $\sum_{i=1}^mP_i=\mathbb{1}_\mathcal{H}$, where $\mathbb{1}_\mathcal{H}$ is the identity matrix on $\mathcal{H}$. If the quantum  register  $\mathcal{H}$ is in a state described by density matrix $\rho\in D(\mathcal{H})$, upon performing the measurement $\{P_i\}_{i=1}^m$ we get the outcome $i$ with probability $\langle P_i, \rho \rangle$,
where
$\langle A, B\rangle = \Tr(A^\dag B)$ is the \emph{Hilbert-Schmidt inner product} defined on  the linear space of Hermitian matrices.  Note that $\langle A, B\rangle$ is a real number for any Hermitian matrices $A$ and $B$, and is   non-negative if $A$ and $B$ are positive~semidefinite.





 Given a finite-dimensional Hilbert   space $\mathcal{H}=\mathbb{C}^n$, we denote by $\text{L}(\mathcal{H})$ the set of linear operators acting on $\mathcal{H}$, i.e.,   the set of all $n\times n$ complex matrices over $\mathcal{H}$.
 A linear operator that maps matrices to matrices, i.e.,  a mapping  $\Phi:\mathrm{L}(\mathcal{B}) \to \mathrm{L}(\mathcal{A})$, is called a {\em super-operator}. The adjoint  super-operator $\Phi^\dagger:\mathrm{L}(\mathcal{A}) \to \mathrm{L}(\mathcal{B})$  is uniquely determined by the equation
$    \langle A, \Phi(B)\rangle = \langle \Phi^\dagger(A), B\rangle
$.  A super-operator $\Phi:\mathrm{L}(\mathcal{B})\to\mathrm{L}(\mathcal{A})$ is    {\em positive} if it maps PSD matrices  to PSD matrices.
There exists a  linear bijection between  matrices $R\in \mathrm{L}(\mathcal{A}\otimes\mathcal{B})$ and super-operators $\Phi:\mathrm{L}(\mathcal{B})\to\mathrm{L}(\mathcal{A})$ known as the {\em Choi-Jamio\l{}kowski isomorphism}. Specifically, for a super-operator $\Phi$  its {\em Choi matrix}~is:
\begin{equation}\label{CJ}
    C_\Phi= \sum_{1\leq i,j\leq m} \Phi(E_{i,j}) \otimes E_{i,j}\in \mathrm{L}(\mathcal{A}\otimes\mathcal{B}),
\end{equation}
where $\{E_{i,j}\}_{i,j=1}^m$ is the standard orthonormal basis of $\mathrm{L}(\mathcal{B}) = \mathbb{C}^{m\times m}$. Conversely, given an operator $R=\sum_{1\le i,j\le m}A_{i,j}\otimes E_{i,j}\in \mathrm{L}(\mathcal{A}\otimes\mathcal{B})$, we can define $\Phi_R:\mathrm{L}(\mathcal{B})\to\mathrm{L}(\mathcal{A})$ by setting $\Phi_R(E_{i,j})=A_{i,j}$ from which it easily follows that $C_{\Phi_R}=R$. Explicitly, we~have
\begin{equation}\label{eqn:superoperator}
    \Phi_R(B) = \mathrm{Tr}_\mathcal{B} (R(\mathbb{1}_\mathcal{A}\otimes B^\top)),
    \end{equation}
    where the partial trace
    $ \mathrm{Tr}_\mathcal{B}:\mathcal{L}(\mathcal{A} \otimes \mathcal{B})\to \mathcal{L}(\mathcal{A})$
    is the {\em unique} function
 that satisfies:
\begin{equation*}\label{basic:ptrace}
\mathrm{Tr}_\mathcal{B}(A\otimes B)=A\Tr(B), \  \forall A, B.
\end{equation*}
Moreover, the  adjoint map is $\mathrm{Tr}_\mathcal{B}^\dagger(A)=A\otimes \mathbb{1}_\mathcal{B}$.
Lastly,  a superoperator $\Phi$ is completely positive (i.e., $\mathbb{1}_m\otimes \Phi$ is positive for all $m\in \mathbb{N}$) iff the Choi matrix of $\Phi$ is positive semidefinite. In particular, if the Choi matrix of the super-operator $\Phi$ is PSD, it follows that $\Phi$ is positive.

\paragraph{Quantum potential games.} 

In the setting of quantum games described in the introduction, we introduce the notion of a quantum potential game as follows. For simplicity we restrict ourselves in this work to the case of two-player games, though the definition can easily be extended to any finite number of players. Furthermore, we restrict ourselves to quantum potential games with bilinear potential. We follow standard game theory notation, where for each player $i$ the set $S_{-i}$ refers to the other player's strategy set.

\begin{definition}[Quantum potential game]
\label{def:_potential_game_quantum}
    Let $\mathcal{A}, \mathcal{B}$ be finite-dimensional quantum registers, and suppose that ${V: D(\mathcal{A}) \times D(\mathcal{B}) \to \mathbb{R}, \ (\rho, \sigma) \mapsto \Tr(R(\rho \otimes \sigma))}$ for some Hermitian operator $R$. A two-player game where the players have strategy sets $S_1 = D(\mathcal{A}), S_2 =  D(\mathcal{B})$ and utility functions $u_i: S_1 \times S_2 \rightarrow \mathbb{R}$ for all players~$ i \in \{1, 2\}$ is called a quantum potential game with potential $V$  if
   
    \begin{equation*}
    \begin{split}
        u_i(s, s_{-i}) - u_i(s', s_{-i})
        =
        V(s, s_{-i}) - V(s', s_{-i}).
    \end{split}
    \end{equation*}
\end{definition}
for all players~$ i \in \{1, 2\}$, $ s_{-i} \in S_{-i}$, and $s, s' \in S_i.$

Quantum potential games fall into the general class of potential games and so admit an equivalent characterization of coordination-dummy separability (see, e.g., \cite{la2016potential}): each player's utility can be separated into a coordination term (which is the same for all players and equal to the potential $V$) and a dummy term (that only depends on the other players), i.e.,
\begin{equation*}
        u_i(s)=V(s)+D_i(s_{-i}).
\end{equation*}
Due to coordination-dummy separation, for each player $i$ the gradients of $u_i(s)$ and $V(s)$ with respect to their own strategy are equal. Thus, the trajectories that players' strategies take under first-order learning dynamics will be the same whether the players play the potential game or the CIG where each player receives utility $V$.

For a two-player game with players Alice and Bob having access to quantum registers $\mathcal{H}_1 = \mathcal{A}$ and $\mathcal{H}_2 = \mathcal{B}$ respectively, we can define Alice's \emph{best response set} to Bob's strategy $\sigma\in D(\mathcal{B})$  by $\BR_{\A}(\sigma) = \{\rho \in \da : u(\rho, \sigma) \ge u(\rho', \sigma) \; \fa \rho' \in \da\}$, and analogously for Bob.
The \emph{Nash equilibria} (NE) of the game are  the strategy profiles $(\rho, \sigma)\in D(\mathcal{A})\times D(\mathcal{B})$ such that Alice's and Bob's strategies are best responses to each other, i.e.
$$ u(\rho, \sigma) \geq u(\rho', \sigma) \; \fa \rho' \in \da $$
and 
$$ \ u(\rho, \sigma) \geq u(\rho, \sigma') \; \fa \sigma' \in \db.$$
Lastly, a Nash equilibrium $(\rho,\sigma)$ is called {\em interior}  if both $\rho$ and $\sigma$ are positive definite.

The set of Nash equilibria in a quantum potential game with potential $V$ is equivalent to the set of Nash equilibria in the quantum common-interest game with common utility $V$ since, by coordination-dummy separability, no player can unilaterally improve their own utility if and only if no player can unilaterally improve the potential $V$. Thus, for the purpose of learning Nash equilibria in quantum potential games using first-order dynamics, it suffices to study quantum common-interest games.























    











    
    






\paragraph{Quantum common-interest games.}
In a  quantum CIG, there are two agents Alice and Bob that
 control  quantum registers $\mathcal{A}$ and $\mathcal{B}$ and their strategies are given by density matrices in    $D(\mathcal{A})$ and $ D(\mathcal{B})$ respectively. Upon playing strategy profile $(\rho,\sigma)\in \ D(\mathcal{A})\times  D(\mathcal{B})$  both players receive a common utility $u(\rho, \sigma) = \langle R, \rho\otimes\sigma\rangle$,  where $R$ is a Hermitian matrix that we can assume withoout loss of generality to be positive definite.

We refer to the matrix $R$ as the {\em game operator}. Equivalently,
using the Choi-Jamio\l{}kowski isomorphism defined in~\eqref{CJ}, it is useful to
also express the utility function as $u(\rho, \sigma) = \langle \rho, \Phi(\sigma^\top) \rangle$, since
\begin{align*}
\langle \rho, \Phi(\sigma^\top) \rangle &= \langle \rho,  \mathrm{Tr}_\mathcal{B} (R(\mathbb{1}_\mathcal{A}\otimes \sigma)\rangle =\langle \rho\otimes \id_\mathcal{B},R(\mathbb{1}_\mathcal{A}\otimes \sigma)\rangle =\langle R, \rho\otimes \sigma\rangle,
\end{align*}
where $R$ is the Choi matrix of $\Phi$. 
Moreover, as $R$ is PSD it follows that $\Phi$ is positive. In order to simplify notation throughout the rest of the paper, we will drop the transpose from the utility and express it as $u(\rho,\sigma) = \langle \rho, \Phi(\sigma) \rangle$ where appropriate. This can be seen as Bob selecting $\sigma^\top$ as his strategy, instead of $\sigma$ as defined before.

A quantum CIG can also be defined  as the mixed extension of a
game  where the players' pure strategies are  complex unit vectors $x\in \mathbb{S}_\mathbb{C}^{n-1}, y\in\mathbb{S}_\mathbb{C}^{m-1}$  and the common utility  is biquadratic, i.e., $u(x,y)=(x\otimes y)^\dagger R(x\otimes y).$ 

If  the players randomize their play using   finitely supported distributions $\mathcal{D}_\mathcal{A}, \mathcal{D}_\mathcal{B}$
over their pure strategy spaces, i.e.,
$\mathcal{D}_\mathcal{A}$ has support
$\{x_i\}_{i=1}^k$
 and ${\rm Prob}(x_i)=\lambda_i$
whereas  $\mathcal{D}_\mathcal{B}$ has support
$\{y_j\}_{j=1}^\ell$
 and ${\rm Prob}(y_j)=\mu_j$
, the expected payoff is bilinear in the density matrices $\rho=\sum_{i=1}^k\lambda_ix_ix_i^\dagger$ and $\sigma=\sum_{j=1}^\ell \mu_jy_jy_j^\dagger$~as
$$\mathbb{E}[(x\otimes y)^\dagger R(x\otimes y)]=\Tr(R(\rho\otimes \sigma)),$$
where expectation is taken over $x\sim\mathcal{D}_\mathcal{A}$, $y\sim \mathcal{D}_\mathcal{B}$.

Lastly, we show that a classical CIG with common utility $x^\top A y$ where $x\in \Delta_n, y\in \Delta_m$ can be viewed as a quantum CIG.   Indeed, consider  the quantum CIG with diagonal game operator $R\in \mathbb{R}^{nm\times nm}$ whose diagonal entries  are  $R_{ij,ij}=A_{ij}$. If we only consider  diagonal densities $\rho=\sum_{i=1}^n x_ie_ie_i^\dagger$ and $\sigma=\sum_{j=1}^my_je_je_j^\dagger$, it is straightforward to verify that  $x^\top A y={\rm Tr}(R(\rho \otimes \sigma))$.

























\paragraph{Relation between quantum CIGs and the BSS problem.} In a quantum CIG, Alice and Bob  try to  jointly  maximize their common utility function $u(\rho, \sigma)=\langle \rho, \Phi(\sigma) \rangle$. Analogous to the classical case, there is a strong connection among the NE of the game and the underlying \ref{BSS} optimization problem. Recall that the \ref{BSS} problem corresponds to  maximizing a linear function over the set of separable states, i.e., 
 \begin{equation}\label{BSS}\tag{BSS}
        \max \{ \Tr(R(\rho\otimes \sigma)):  \rho \in \da, \sigma \in \db\}.
\end{equation} 
\ref{BSS} is closely related to the problem of testing whether a state is separable, which is computationally intractable \cite{gurvits2003classical, gharibian2008strong} and has been studied extensively, e.g. see  
\cite{ioannou2006computational, bruss2002characterizing, terhal2002detecting, doherty2004complete, barak2017quantum}. 




\begin{theorem}\label{thm:_KKT_NE_equiv}
The Nash equilibria of a two-player quantum common-interest game with common utility function $u(\rho, \sigma)=\langle \rho, \Phi(\sigma) \rangle$
correspond to the KKT points of~\ref{BSS}.

\end{theorem}
The proof 
is deferred to Appendix~\ref{proof:_KKT_NE_equiv}.








For a classical game, if $(x,y)$ is a Nash equilibrium, every pure strategy that is played by Alice with positive probability  is a best response to $y$, i.e., for each $i$ with  $x_i > 0$ we have $(Ay)_i=x^TAy$, and similarly for Bob.
We now prove the analogous statement for quantum CIG games.




\begin{theorem}
\label{thm:_interiorNE_equal_payoff}
    Let $(\rho, \sigma)$ be a Nash equilibrium of a two-player quantum CIG  with common utility  $u(\rho, \sigma) = \innerprod{\rho}{\Phi(\sigma)}$. If  $\rho \succ 0$, we have that  $
        \Phi (\sigma) = \innerprod{\rho}{\Phi(\sigma)}\id_\mathcal{B}$, i.e., for any  $\rho'
        \in D(\mathcal{A}) $ we have $\rho'\in \BR_{\A}(\sigma).$
  
Similarly,  if $(\rho, \sigma)$ is a Nash equilibrium and $\sigma \succ 0$, then $\Phi^\dagger(\rho) =~\innerprod{\rho}{\Phi(\sigma)}\id_\mathcal{A}$.
\end{theorem}
The proof
is deferred to Appendix \ref{proof:_interiorNE_equal_payoff}.
With the connection between Nash equilibria and KKT points established, and  motivated by the well-known  classical result that `natural' learning dynamics converge to Nash equilibria in classical CIGs, in the next section
we propose a non-commutative extension of one such family of gradient flow dynamics and study their theoretical convergence properties.



\section{Continuous-time Dynamics}\label{sec:contdynamics}

\paragraph{Gradient flow dynamics.}
While the implementation of learning in game theory often requires algorithms in discrete time, past work has shown that continuous-time dynamics can give rise to families of discrete dynamics. The most relevant such examples to our work are in the context of gradient-based optimization algorithms \cite{wibisono2016variational, nemirovskij1983problem} and evolutionary game dynamics \cite{mertikopoulos2018riemannian, shahshahani1979new}.


Consider a differentiable manifold $\M$ equipped with a differentiable scalar field $u : \M \rightarrow \mathbb{R}$ and a symmetric, positive-definite inner product $\innerprod{\cdot}{\cdot}_p : T_p \M \times T_p \M \rightarrow~\mathbb{R}_{\geq 0}$ defined at all $p \in \M$. (Here $T_p \M$ is the tangent space of $\M$ at $p$.) 
By the Riesz Representation Theorem (see, e.g., \cite{rudin1987real}), at each  $p \in \M$ there exists a \emph{unique} vector $\grads{} u(p) \in T_p \M$~with
\begin{equation}\label{gengrad}
    D_pu(\xi) = \innerprod{\grads{} u(p)}{\xi}_p \quad \fa \xi \in T_p \M,
\end{equation}
where $D_pu(\xi) : T_p \M \rightarrow \mathbb{R}$ is the directional derivative of $u$ at the point $p$ in direction $\xi$, i.e.,
$D_pu(\xi)~=~\langle \nabla u(p), \xi\rangle$ where $ \nabla u(p)$ is the usual Euclidean gradient of $u$ at $p$ and $\langle \cdot, \cdot \rangle$ is the Euclidean inner product. Equation  \eqref{gengrad} allows us to associate to each point $p\in \M$ a vector $\grads{}u(p)\in~T_p~\M$, or in other words, to define a gradient flow on the manifold $\M$ given explicitly by  $  \dot{p} = \grads{} u(p)$.
Moreover, simply by construction, it follows that the function $u(p)$ is nondecreasing along the trajectories of the gradient flow, i.e., $\dv{u(p)}{t}\ge 0$ since $\dv{u(p)}{t}
=\langle \nabla u(p), \dot{p}\rangle
=D_pu(\dot{p})=\innerprod{\grads{} u(p)}{\dot{p}}_p = \innerprod{\dot{p}}{\dot{p}}_p \ge0,$
and moreover $\dv{u(p)}{t}=0$ if and only if $\dot{p}=0$ (as the inner product $\innerprod{\cdot}{\cdot}_p $ is positive definite), so $u$ is in fact strictly increasing along gradient flow trajectories unless at a fixed~point.








\paragraph{Quantum Shahshahani gradient flow.}
\label{subsec:_GradFlow_IntrinsicMetric}
Consider a two-player quantum CIG  with common utility 
$
u(\rho, \sigma)= \langle\rho, \Phi(\sigma) \rangle.
$
Our goal is to provide continuous-time dynamics that improve the utility $u(\rho,\sigma)$. The state space we are operating in is the manifold
$\M~=~\da~\times~\db$,
so all that remains is to select a metric on the manifold of density matrices which would imbue the product manifold $\M$ with the product metric, giving a gradient flow.
To accomplish this, we consider the generalized family of Riemannian metrics on the manifold of  PSD matrices parametrized by $q \in \mathbb{R}$, which we call the \emph{quantum $q$-Shahshahani metric}:
\begin{equation}
\label{cor:_a_quantShah}
    \innerprod{A}{B}_\rho^{(q)} := \Tr[\rho^{-\frac{q}{2}}A \rho^{-\frac{q}{2}}B].
\end{equation}
Indeed, in the case of diagonal matrices this family of metrics reduces to the $q$-Shahshahani family of metrics on the simplex (see \cite{mertikopoulos2018riemannian}).  On the PSD manifold, $q = 0$ gives the Euclidean inner product $\Tr[AB]$, while $q = 2$ gives the intrinsic Riemannian metric, e.g. see   \cite{bhatia2009positive}. In addition, $q=1$ reduces to the Shahshahani metric on the simplex in the case of diagonal~matrices.


\begin{theorem}[{Linear quantum $q$-replicator dynamics}]
\label{thm:_a_quantShah}
Consider a quantum CIG with utility function    $u(\rho,\sigma)=\langle \rho, \Phi(\sigma) \rangle$ where $\rho \in D(\mathcal{A}), \sigma \in D(\mathcal{B})$. The dynamics
\begin{equation} %
\begin{gathered} \label{eqn:_aQREP_family} \tag{\rm{lin-QREP$_q$}}
    \dv{\rho}{t}  = \rho^\frac{q}{2} \left[\Phi(\sigma) - \frac{\Tr[\rho^q \Phi(\sigma)]}{\Tr[\rho^q]}\id_\mathcal{A} \right] \rho^\frac{q}{2}, \\
    \dv{\sigma}{t}  =  \sigma^\frac{q}{2} \left[\Phi^\dagger (\rho) -\frac{\Tr[\sigma^q \Phi^\dagger(\rho)]}{\Tr[\sigma^q]}\id_\mathcal{B} \right] \sigma^\frac{q}{2}
\end{gathered}
\end{equation}
define a gradient flow of the utility function $u(\rho, \sigma)$ on the product manifold $D(\mathcal{A})\times D(\mathcal{B})$ imbued  with the quantum $q$-Shahshahani metric. Moreover, the utility  $u(\rho,\sigma)$ is strictly increasing  along the trajectories of the \ref{eqn:_aQREP_family} dynamics, unless we are at a fixed~point.
\end{theorem}



    
    
    
    
    
    

The proof of that \ref{eqn:_aQREP_family}  is a gradient flow is given in Appendix \ref{sec:_GradFlowDerivation}. In terms of the convergence properties of \ref{eqn:_aQREP_family} we have the following result:






\begin{corollary}\label{cor:omegalimits}
    The set of $\omega$-limit points of a trajectory $\{\rho(t), \sigma(t)\}_{t\ge 0}$ of the \ref{eqn:_aQREP_family} dynamics is a compact, connected set of fixed points of the dynamics that all attain the same utility.
\end{corollary}

The proof of this result follows directly from  an extension of the fundamental convergence theorem by \cite{losert1983dynamics} to general compact sets,
which we prove in   Theorem \ref{thm:_LimitSetCompactConnected_ContTime}.















\paragraph{Linear quantum replicator dynamics.}








For
$q = 1$, the \ref{eqn:_aQREP_family} dynamics specialize~to:
\begin{equation}
\begin{gathered}\label{eqn:_QREP}\tag{{\rm lin-QREP}}
        \dv{\rho}{t}
        = \rho^{\sfrac{1}{2}} \Big[\Phi(\sigma) - \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{A} \Big] \rho^{\sfrac{1}{2}},\\
        \dv{\sigma}{t}
        = \sigma^{\sfrac{1}{2}} \Big[\Phi^\dagger (\rho) -\innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{B} \Big] \sigma^{\sfrac{1}{2}}
\end{gathered}
\end{equation}
which we call the {\em linear quantum replicator dynamics}.
We next observe that the \ref{eqn:_QREP} dynamics are a non-commutative generalization of the celebrated replicator dynamics \cite{weibull1997evolutionary,sandholm2010population}: specifically, the  \ref{eqn:_QREP} dynamics reduce to the usual replicator dynamics when applied to the quantum embedding of a classical  CIG with common utility  $x^\top Ay$. A full explanation of this observation can be found in Appendix \ref{appsecs:diagonal}.

Finally, we relate the interior fixed points and limit points of \ref{eqn:_QREP} with Nash equilibria.

\begin{theorem}
\label{thm:qrepproperties}
For a quantum CIG with common utility function $u(\rho,\sigma)=\langle \rho, \Phi(\sigma)\rangle $ where $\rho \in D(\mathcal{A}), \sigma \in D(\mathcal{B})$, we have the following two properties relating interior fixed points and $\omega$-limit points of the \ref{eqn:_QREP} dynamics with Nash equilibria of the game:
\begin{enumerate}
	\item The  set of interior fixed points of  the \ref{eqn:_QREP} dynamics  is equivalent to the set of interior Nash equilibria.
	\item The interior $\omega$-limits of any trajectory of the \ref{eqn:_QREP} dynamics are Nash equilbria.
\end{enumerate}
\end{theorem}
\begin{proof} $(1)$ By Theorem \ref{thm:_interiorNE_equal_payoff}, if $(\rho, \sigma)$ is an interior NE of the game, then $\Phi(\sigma) = \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{A}$  and $\Phi^\dagger(\rho)~=~ \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{B}$. This immediately implies that
    $\dot{\rho} = \rho^{\sfrac{1}{2}} \left[\Phi(\sigma) - \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{A} \right] \rho^{\sfrac{1}{2}} = 0$
    and
    $\dot{\sigma}~=~\sigma^{\sfrac{1}{2}} \left[\Phi^\dagger(\rho) - \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{B} \right] \sigma^{\sfrac{1}{2}}~=~0$, i.e., $(\rho, \sigma)$ is a fixed point of the \ref{eqn:_QREP} dynamics. Conversely,  let  $(\rho, \sigma)$ be an interior fixed point of the \ref{eqn:_QREP} dynamics. As $\rho$ is invertible  and  $\dot{\rho}=0$ we immediately get that
$\Phi(\sigma) = \innerprod{\rho}{\Phi(\sigma)} \id_\mathcal{A}.$ In turn, this implies that
$\rho \in \BR_{\A}(\sigma)$ and similarly, $\sigma \in \BR_{\B}(\rho)$. Thus, $(\rho, \sigma)$ is an interior~NE.

$(2)$ By Corollary \ref{cor:omegalimits}, all $\omega$-limits of any trajectory of the \ref{eqn:_QREP} dynamics are fixed points. But as we have just proven, interior fixed points are Nash equilibria.
\end{proof}
















\section{Discrete-time Dynamics}\label{sec:discdynamics}
Consider a  quantum common-interest game  with  utility 
$
u(\rho, \sigma)= \innerprod{\rho}{\Phi(\sigma)},
$
where the Choi matrix $R$ corresponding to the superoperator $\Phi$ is strictly positive. 
 In this section we study a  discretization of the \ref{eqn:_QREP} dynamics given~by

\begin{equation}
\begin{gathered}\label{eqn:_DQREP} \tag{\rm{lin-MMWU}}
        \new{\rho} \leftarrow \frac{1}{\innerprod{\rho}{\Phi(\sigma)}}
        \powh{\rho} \Phi(\sigma) \powh{\rho},\\
        \new{\sigma} \leftarrow
        \frac{1}{\innerprod{\new{\rho}}{\Phi(\sigma)}} \powh{\sigma} \adj{\Phi}(\new{\rho}) \powh{\sigma}
\end{gathered}
\end{equation}
which we call the {\em linear matrix multiplicative weights update}. The \ref{eqn:_DQREP}     is defined in an  alternating manner (i.e.,  $\rho$ and $\sigma$ are updated in turn) and only uses first-order information (i.e., to perform the update, each agent only needs to know the   gradient of the  utility with respect to their own density).
Moreover, the  \ref{eqn:_DQREP} is a 
non-commutative extension of \ref{linear} in the sense that  \ref{eqn:_DQREP} reduces to
\ref{linear} when the game operator is diagonal and $\rho, \sigma$ are diagonal~densities.  Indeed, define the diagonal game operator $R_{ij,ij}=A_{ij}$, and consider  diagonal density matrices $\rho=\sum_\ell x_\ell e_\ell e_\ell^\dagger$ and $\sigma=\sum_k y_k f_k f_k^\dagger $. By  \eqref{diag} we have  $\Phi(\sigma)={\rm diag}(Ay).$ 
Thus, 
$$ \new{\rho} =\frac{1}{\innerprod{\rho}{\Phi(\sigma)}}
        \powh{\rho} \Phi(\sigma) \powh{\rho}={1\over x^\top Ay}{\rm diag}(x \circ (Ay)),
      $$
where $\circ$ denotes the componentwise  vector~product.

For \ref{eqn:_DQREP}
 to be well-defined, we need the game operator $R$ to be  positive definite. Indeed, when applied to   $(\rho, \sigma) \in \da \times \db$, the update $\new{\rho}$ is  also a density matrix as $\rho^{1/2}\Phi(\sigma)\rho^{1/2}$ is PSD (as $R$ is PSD and thus  $\Phi$ is positive) and the scalar  $\innerprod{\rho}{\Phi(\sigma)}$ is also strictly positive (as $\innerprod{\rho}{\Phi(\sigma^\top)}=\langle R, \rho\otimes \sigma\rangle $ and $R$ is positive definite and $\rho \otimes \sigma^\top$ is PSD). Lastly, using the cyclic property of the trace, clearly the update $\new{\rho}$ has trace equal to one. 
 
 



We now show that  the utility is increasing under the discrete-time~\ref{eqn:_DQREP} updates, as it was along trajectories of the continuous-time \ref{eqn:_QREP} dynamics.



\begin{theorem}\label{thm:nondecreasingutilitydiscrete}
   For any quantum CIG  with a positive definite game operator $R$, the common  utility $u(\rho, \sigma) = \innerprod{R}{\rho \otimes \sigma} = \innerprod{\rho}{\Phi(\sigma)}$, 
  
   
    is strictly increasing along the trajectories of   \ref{eqn:_DQREP}, except at a fixed point.
\end{theorem}

\begin{proof} First, note that
\begin{align*}
    \innerprod{\new{\rho}}{\Phi(\sigma)}
=\frac{1}{\innerprod{\rho}{\Phi(\sigma)}}
\Tr[\powh{\rho} \Phi(\sigma) \powh{\rho} \Phi(\sigma)]
=\frac{1}{\innerprod{\rho}{\Phi(\sigma)}}
\Tr[(\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}})^2].
\end{align*}

Since $\Tr[(\powh{\rho})^2]=1$, we get that  
\begin{align*}
    \Tr[(\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}})^2] = \Tr[(\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}})^2] \Tr[(\powh{\rho})^2] 
    \geq \Tr[(\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}})\powh{\rho}]^2 = \innerprod{\rho}{\Phi(\sigma)}^2
\end{align*}
where the inequality follows by Cauchy-Schwarz. Moreover, if equality holds then $\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}} = c\powh{\rho}$ for some scalar $c$. Consequently,  $\powh{\rho} \Phi(\sigma) \powh{\rho} = \rho^{\sfrac{1}{4}} (\rho^{\sfrac{1}{4}} \Phi(\sigma) \rho^{\sfrac{1}{4}}) \rho^{\sfrac{1}{4}} = c \rho$, and so $\new{\rho} = \frac{c}{\innerprod{\rho}{\Phi(\sigma)}} \rho = \rho$ (since $\Tr[\new{\rho}] = 1$).
Putting everything together we get that $\innerprod{\new{\rho}}{\Phi(\sigma)} \geq \innerprod{\rho}{\Phi(\sigma)}$ with equality if and only if $\new{\rho} = \rho$. Similarly, $\innerprod{\rho}{\Phi(\sigma)}$ is also strictly increasing under the $\sigma$-update unless $\new{\sigma} = \sigma$.
\end{proof}



As a consequence of Theorem \ref{thm:nondecreasingutilitydiscrete},  either agent could individually perform an update at any given  time and increase the common utility, allowing  \ref{eqn:_DQREP} to be viewed  a {\em decentralized} dynamic where the agents do not need to coordinate 
the order in which they perform  updates. 


Next, we   obtain a result about convergence to fixed points for \ref{eqn:_DQREP} similarly to   \ref{eqn:_QREP}. 

\begin{corollary}
\label{thm:_hofsigUpdatesAlternating_limitpoints_ClosedConnectedFixedPts}
    The set of limit points of an orbit $\{\big(\rho(t),\sigma(t)\big)\}_{t\in \mathbb{N}}$ of  \ref{eqn:_DQREP}  is a compact, connected set of fixed points.
\end{corollary}


The proof of this result follows directly from an extension of the fundamental convergence theorem by \cite{losert1983dynamics} to general compact sets (see
Theorem~\ref{thm:_LimitSetCompactConnected_DiscreteTime}).

Finally, we relate the fixed points of \ref{eqn:_DQREP}  with the fixed points of~\ref{eqn:_QREP}.


\begin{theorem}
\label{thm:_FixedPointsSame_ContTime_DiscreteTime}
    The set of fixed points of the discrete-time update rule \ref{eqn:_DQREP} is equal to the set of fixed points of the continuous-time gradient flow \ref{eqn:_QREP}.
\end{theorem}
The proof is deferred to Appendix~\ref{proof:fixedpoint}.





\section{Experiments}\label{sec:experiments}









\paragraph{\ref{eqn:_QREP} converges empirically to Nash equilibria.}

It is well known that natural learning dynamics converge to pure Nash equilibria in classical CIGs \cite{kleinberg2009multiplicative}. We experimentally test if a similar property holds for \ref{eqn:_QREP} by utilizing the concept of \emph{exploitability} \cite{johanson2011accelerating}, defined as $$\frac{1}{2}\left[\lambda_{\max} (\Phi(\sigma)) - \langle \rho, \Phi(\sigma)\rangle + \lambda_{\max} (\Phi^\dagger(\rho)) - \langle\Phi^\dagger(\rho), \sigma\rangle\right],$$
where $\lambda_{\max} (\Phi(\sigma))$ and $\lambda_{\max} (\Phi^\dagger(\rho))$ are the maximum eigenvalues of $\Phi(\sigma)$ and $\Phi^\dagger(\rho)$ respectively. Using the variational characterization of eigenvalues,
$$\lambda_{\max}(\Phi(\sigma))=\max\{\langle \rho', \Phi(\sigma)\rangle:\ \rho'\in \da\},$$
the difference $\lambda_{\max} (\Phi(\sigma)) - \langle \rho, \Phi(\sigma)\rangle $ is exactly the maximum gain the $\rho$-player can attain by unilaterally deviating from $(\rho,\sigma)$.
Thus, if a profile $(\rho, \sigma)$ is $\epsilon-$exploitable, then it is an $2\epsilon-$Nash equilibrium, in the sense that no player can unilaterally improve their payoff by $\geq 2\epsilon$. In Figure \ref{fig:exploit} we plot  the exploitability  of \ref{eqn:_QREP} in 100 randomly generated $\mathcal{H}_2 \otimes \mathcal{H}_2$ quantum CIG instances with uniform initialization, where $\mathcal{H}_n$ denotes an $n$-level quantum system. However, \ref{eqn:_DQREP} (the discretization of \ref{eqn:_QREP}) converges in some cases to states with positive exploitability.

\begin{figure}[!tb]
    \centering
    \begin{minipage}{.45\linewidth}
      \centering
      \includegraphics[width=.95\linewidth]{Images/exploit_cont300.png}
     \ref{eqn:_QREP}
   
    \end{minipage}
    \begin{minipage}{.45\linewidth}
      \centering
      \includegraphics[width=.95\linewidth]{Images/exploit_discrete300.png}
      \ref{eqn:_DQREP}
   
    \end{minipage}
    \caption{Exploitability of \ref{eqn:_QREP} and \ref{eqn:_DQREP}.}
    \label{fig:exploit}
\end{figure}



\paragraph{\ref{eqn:_DQREP} as an algorithm for the \ref{BSS} problem.}
In this section we evaluate the performance of  \ref{eqn:_DQREP} applied to 
the \ref{BSS} problem. 
The global optimum \texttt{OPT} of the \ref{BSS} problem for $\mathcal{H}_2 \otimes \mathcal{H}_2$ and $\mathcal{H}_2 \otimes \mathcal{H}_3$ systems can be obtained exactly by solving a semidefinite program (see Appendix \ref{appsecs:sdp} for a detailed explanation). We benchmark \ref{eqn:_DQREP} against this ground truth  optimal value.
In each run of the experiments, we randomly generate a Hermitian positive definite  matrix $R$ and standardize a uniform diagonal initialization (i.e. $\mathbb{1}_\mathcal{H}/n$) for \ref{eqn:_DQREP}.  We run \ref{eqn:_DQREP} until convergence, which we detect  by checking the moving average (window size $= 5$) of the players' utility and terminate the algorithm if the moving average stabilizes  for several iterations.  As a performance metric, we report the mean relative accuracy of \ref{eqn:_DQREP}'s output compared to \texttt{OPT} across 100 runs. 
We also report the average number of iterations needed to find a fixed point/solution, along with the  standard deviation of the accuracy across the 100 runs. All these results are summarized  in Table \ref{table:sdpresults}. Figure \ref{fig:dqrepperformance} visualizes our results, and we also include a version of the experiment where the initializations for each player are random density matrices instead of uniform diagonal matrices.


\begin{table*}[!tb]
\centering
\caption{Empirical performance of discrete dynamic \ref{eqn:_DQREP} for the BSS problem.}
\label{table:sdpresults}
\begin{tabular}{@{}ccccc@{}}
\toprule
\multirow{2}{*}{\textbf{Problem Dimensions}}        & \multirow{2}{*}{\textbf{Runs}} & \multicolumn{2}{c}{\textbf{Accuracy}} & \multirow{2}{*}{\textbf{Average Iterations to Convergence}} \\ \cmidrule(lr){3-4}
                                      &                                & \textbf{Mean}   & \textbf{Std. Dev.}  &                                                             \\ \midrule
$\mathcal{H}_2 \otimes \mathcal{H}_2$ & 100                            & 0.972           & 0.0409              & 15.33                                                       \\
$\mathcal{H}_2 \otimes \mathcal{H}_3$ & 100                            & 0.965           & 0.0349              & 20.14                                                        \\ \bottomrule
\end{tabular}
\end{table*}

\begin{figure}[!htb]
    \centering
    \begin{minipage}{.45\linewidth}
      \centering
      \includegraphics[width=.95\linewidth]{Images/avgperformanceunif.png}
     Random games, uniform initialization
   
    \end{minipage}
    \begin{minipage}{.45\linewidth}
      \centering
      \includegraphics[width=.95\linewidth]{Images/avgperformancerand.png}
      Random games, random initializations
   
    \end{minipage}
    \caption{Ratio of the utility attained using \ref{eqn:_DQREP} vs \texttt{OPT}, averaged over 100 random BSS problem instances. Shaded region represents $\pm 1$ standard deviation from the mean, and each iteration represents alternating updates for $\rho$ and $\sigma$.}
    \label{fig:dqrepperformance}
\end{figure}



\begin{figure*}[!htb]
    \centering
    \includegraphics[width=0.91\linewidth]{Images/bloch/Slide2.PNG}
    \caption{Trajectories going to the boundary of the Bloch-sphere in 100 random game instances with uniform density initializations.
    Points are color-coded based on distance to boundary, with yellow denoting points that are close the the boundary.
    }
    \label{fig:blochrandgame}
\end{figure*}





\paragraph{\ref{eqn:_DQREP} as an algorithm  for biquadratic optimization.}
 In the quantum CIG  setting,  the players' strategies are density matrices, which (via the SVD) correspond to distributions  over rank-1 densities. Consequently,  
a quantum CIG can be viewed as the mixed extension of a common-interest game where players choose unit vectors and share a biquadratic utility. 
 Our experiments suggest that  when the players in a  quantum CIG with game operator $R$ use the \ref{eqn:_DQREP},  their states converge to rank-1 density matrices, an intriguing  analogue of the  convergence result for classical CIGs.
 Specifically, for a fixed, randomly generated game instance (i.e. a $4\times4$ Hermitian $R\succ 0$), we run \ref{eqn:_DQREP} on 100 randomly generated $\mathcal{H}_2 \otimes \mathcal{H}_2$ games with uniform initialization for both players and visualize them on the Bloch sphere (Figure \ref{fig:blochrandgame}), which is a standard technique for visualizing $2\times2$ density matrices. (For a detailed explanation of the Bloch sphere visualization see Appendix~\ref{appsecs:bloch}.)  We observe that the trajectories of \ref{eqn:_DQREP} converge to the boundary of the Bloch sphere. Since rank-1 densities in the quantum CIG correspond to unit vectors in the biquadratic optimization problem over the product of unit spheres 
$\max\{(x\otimes y)^\dagger R(x\otimes y):\|x\|_2=1, {\|y\|_2=1}\}$, this means that \ref{eqn:_DQREP} can be interpreted as a learning algorithm for solving the biquadratic problem. 

Finally, we present larger scale experiments in Appendix \ref{appsecs:additionalexp} which show that our results for convergence to fixed points still holds in systems of larger dimensions.




 
 

 








 



\section{Conclusion}
This paper extends the existing framework for learning in quantum games beyond the zero-sum setting and studies learning in quantum potential games. We introduce and study quantum potential games, which from the perspective of first-order learning dynamics are equivalent to quantum common-interest games. For learning in quantum CIGs, we introduce non-commutative extensions of continuous and discrete dynamics used for learning in classical potential games and study their convergence properties. Our work establishes deep connections between (online) optimization theory (i.e. multiplicative weights update), traditional as well as evolutionary game theory (i.e. replicator dynamics), and quantum randomness/entanglement (i.e. density matrices).


This work opens up several exciting new research directions, the first of which is to theoretically corroborate the experimental findings for convergence of \ref{eqn:_QREP} to Nash equilibria.
Another intriguing open question is to what extent and under what conditions the \ref{eqn:_DQREP} dynamics can be provably shown to converge to a rank-1 matrix for both players in quantum CIGs in analogy to aforementioned classical results~\cite{kleinberg2009multiplicative, heliou2017learning,panageas2019multiplicative}.More broadly, the ultimate goal of this line of research is to develop a general theory for learning in arbitrary quantum games. Developing such a framework would require new notions of equilibration and convergence in quantum games, echoing well-explored results for learning in classical games~\cite{BaileyEC18,cesa2006prediction,fudenberg1998theory, roughgarden2010algorithmic}.


\section*{Acknowledgments}
 
This research is supported in part by the National Research Foundation, Singapore and the Agency for Science, Technology and Research (A*STAR) under its Quantum Engineering Programme NRF2021-QEP2-02-P05, and by the National Research Foundation, Singapore and DSO National Laboratories under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016), NRF 2018 Fellowship NRF-NRFF2018-07, NRF2019-NRF-ANR095 ALIAS grant, grant PIESGP-AI-2020-01, AME Programmatic Fund (Grant No.A20H6b0151) from A*STAR and Provost’s Chair Professorship grant RGEPPV2101. Wayne Lin and Ryann Sim gratefully acknowledge support from the SUTD President's Graduate Fellowship (SUTD-PGF).

\bibliographystyle{abbrv}
