In this section, we introduce a concept of quantum reward oracle and provide a formal descriction of the quantum bandit problem. We also introduce the mean reward function and the reproducing kernel Hilbert space (RKHS) associated with the kernel. Finally, we briefly review Mercer's theorem, which is a key tool for our theoretical analysis.
\subsection{Quantum Reward Oracle and Quantum Bandits}
Following \citep{wan2023quantum}, we introduce a notion of quantum reward oracle.
Let $\cX$ be a set of actions.
We let $n \in \ZZ_{\ge 1}$ and consider 
the space $\stsp_n$ of $n$ qubits and the set $\meassp_n$ of binary sequences.
For each action $x \in \cX$, 
let $y_x$ be a random variable taking values in $[0, 1]$ 
such that $P(y_x = v_x(\sigma)) = p_x(\sigma)$ for $\sigma \in \meassp_n$,
where $v_x: \meassp_n \rightarrow \RR$ is a function
and $(p_x(\sigma))_{\sigma \in \meassp_n}$ is a probability measure on $\meassp_n$.
As in Sec. \ref{subsec:quantum-monte-carlo}, we assume that 
a unitary operator $\cO_x$ on $\stsp_{n+1}$ is given and $\cO_x \ket{0^{n+1}}$ is given as :
\begin{equation*}
    \sum_{\sigma \in \meassp_n}
    \sqrt{p_x(\sigma)} \ket{\sigma}(\sqrt{1-v_x(\sigma)}\ket{0} + \sqrt{v_x(\sigma)}\ket{1}).
\end{equation*}
% As explained in Sec. \ref{subsec:quantum-monte-carlo}, for each action $x \in \cX$, we can regard $\cO_x$
% as a quantum algorithm that returns $v_x(\sigma)$ with probability $p_x(\sigma)$ for $\sigma \in \meassp_n$.
We assume that the expected reward associated to an action $x \in \cX$ is given as $\ex{y_x}$
and we define the mean reward function $\mreward: \cX \rightarrow [0, 1] $ 
as $\mreward(x) = \ex{y_x}$.
In Sec. \ref{subsec:reward-function}, we shall detail assumptions on the mean reward function $\mreward$.
Following \citep{wan2023quantum}, we call the operator $\cO_x$ or its adjoint $\cO_x^\dagger$ a quantum reward oracle.


In this paper, we consider the following sequential decision making problem.
For each round $t = 1, \dots, T$, a player selects an action $x_t \in \cX$ 
and incurs an instantaneous regret $\mreward(x^*) - \mreward(x_t)$,
where $x^* = \argmax_{x \in \cX}\mreward(x)$.
During the process,
the player can invoke any unitary operators and perform a measurement,
however, we assume that the number of calls of quantum reward oracles $\cO_x, \cO_x^{\dagger}$ is limited up to $T$.
The objective of the player is to minimize the cumulative regret defined as 
\begin{math}
    R(T) = \sum_{t=1}^T \left(\mreward(x^* ) - \mreward(x_t)\right).
\end{math}

We note that we can apply any classical bandit algorithm for regret minimization to our problem setting.
More precisely, for each round $t$ and a selected action $x_t$, by invoking the quantum reward oracle $\cO_{x_t}$
and performing a measurement, we observe a random reward $\mreward(x_t) + \varepsilon_t \in [0, 1]$ 
with the expectation $\mreward(x_t)$.
Therefore, based on observed rewards $\mreward(x_1)+\varepsilon_1, \dots, \mreward(x_{t-1})+\varepsilon_{t-1}$,
any classical bandit algorithm can select an action $x_t$ to minimize the cumulative regret.
However, in our problem setting, the player can perform quantum computation with a limited number of 
oracle queries.
Therefore, an optimal algorithm for quantum bandits potentially could perform much better than 
classical bandit algorithms in terms of cumulative regret.

\subsection{Mean Reward Function and RKHS}
\label{subsec:reward-function}
We make the following assumption on the mean reward function $\mreward: \cX \rightarrow [0, 1]$
\footnote{We refer to a remark after \Cref{assump:upper-bd} for validity of this assumption.}.
Let $k: \cX \times \cX \rightarrow \RR$ be a semi-positive definite kernel.
We denote by $\cH_k$ the RKHS corresponding to the kernel $k$, i.e., 
$\cH_k$ is the subspace of real valued functions on $\cX$ satisfying the following three conditions.
(i) $\cH_k$ is a real Hilbert space.
(ii) For any $x' \in \cX$, the feature vector $\phi(x')$ belongs to $\cH_k$,
where $\phi(x')$ is a function on $\cX$ defined as $x \mapsto k(x, x')$.
(iii) For any $f \in \cH_k$ and $x \in \cX$, we have $\langle f,  \phi(x)\rangle = f(x)$.
The last property is called the reproducing property.
We call the map $\cX \ni x \mapsto \phi(x) \in \cH_k$ the feature map of the RKHS.
For $f, g \in \cH_k$, we denote the inner product $\langle f, g\rangle$ by $f^\trn g$.
We assume that the mean reward function $\mreward$ belongs to $\cH_k$, i.e.,
there exists $\theta^* \in \cH_k$ such that $\mreward(x) = \phi(x)^\trn \theta^*$.

Examples of kernels defined on $\RR^d \times \RR^d$ 
includes the squared exponential (SE) kernels,
Mat\'ern-$\nu$ kernels, and rational quadratic (RQ) kernels.
Let $l > 0$ be a length-scale parameter.
A SE kernel $\kse$ is defined by 
$\kse(x, y) = \exp\left(-\| x - y\|^2_2/l\right)$, where $x, y \in \RR^d$.
A Mat\'ern kernel $\kmatern$ is defined on $\RR^d \times \RR^d$
by $\kmatern(x, y) = \frac{2^{1-\nu}}{\Gamma(\nu)}
(a\sqrt{2\nu})^\nu K_\nu(a\sqrt{2\nu})$ for $x, y \in \RR^d$,
where $\nu > 0$ is a smoothness parameter,
$a = \|x-y\|_2/l$, and $K_\nu$ is the modified Bessel function of the second kind.
A RQ kernel $\krq$ is defined by 
$\krq(x, y) = \left(1 +\| x-y\|_2^2/(2\nu l^2) \right)^{-\nu}$ for $x, y \in \RR^d$,
where $\nu > 0$ is a parameter.
% We also assume the following boundedness conditions 
% $|k(x, x') | \le \overline{k}$ and $\|\theta^*\|_{\cH_k} \le S$ with $\overline{k}, S > 0$


% In our analysis, for simplicity, we assume that $\dim \cH_k$ is finite.
% For example, this condition is satisfied if the action set $\cX$ is finite (e.g., $\cX$ is a discretization of a domain 
% in a Euclidean space) since $\dim \cH_k \le |\cX|$.
% We note that our regret bounds do not depend on $\dim \cH_k$ unlike \citep{wan2023quantum},
% and if the decay rate of the eigenvalues of the Mercer operator is fast, 
% then our regret bounds can be much smaller $\dim \cH_k$.


% We give Quantum linear stochastic algorithm for nonlinear reward function by using kernel method.
% It has $T$ rounds, each round $t$, a learner chooses an action$x \in \mathcal{X}$ and observes a reward$\phi(x)$,where $\mathcal{X}\subseteq \mathbb{R}^d$ and $\phi :\mathcal{X}\rightarrow\mathcal{H}$.
% $\phi$ is defined over a compact set
% $\mathcal{X} \subseteq \mathbb{R}^d$ non-linear and assumed to live
% in a reproducing kernel Hilbert space (RKHS).
% %上は言い方修正
% This Hilbert space is usually assumed to be infinite.
% In this paper we assume it has sufficiently large dimension.(Not infinite).
% In particular, $\|\phi\|_{H_k} \leq 1$, where $\|\cdot\|_{H_k}$ denote the RKHS norm.
% There is an unknown parameter $\theta^{\ast} \in \mathbb{R}^d$ which determines the mean
% reward of each action.
% The expected reward of action $x$ is $\mu(x) = \phi(x)^t\theta^*\in [0, 1]$.
% when the arm $x_s$ is selected, Oracle $\mathcal{O}_{x_{s}}$ is called 
% to obtain an estimation reward $\phi(x_s^T)\theta^*$. 
% The goal is to minimize the cumulative regret
% \begin{equation}
%     R(T)=\sup_{x \in \cX}\sum_{t=1}^{T}(\phi(x^*)-\phi(x_t))^t\theta^*.
% \end{equation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Mercer's Theorem}
As we stated in the introduction, our regret bounds involve the decay rate of the eigenvalues of the Mercer operator.
Here, following \cite[Chapter 4.5]{steinwart2008support}, we briefly review the theoretical properties of the Mercer operator.
% In this section,we give some difinitions on RKHS and Mercer’s Theorem.  \par
% Let$\mathcal{X}$ be a compact metric space, $k :\mathcal{X} \times \mathcal{X}  \rightarrow \mathbb{R}$.The Hilbert space $\mathcal{H}$ that has functions on $\mathcal{X}$ equipped with an inner product$<>$ is called reproducing kernel Hilbert space
% (RKHS). 
Let $\cX$ be a measurable space and $\nu$ be a finite measure on $\cX$, and $k: \cX \times \cX \rightarrow \RR$ be a measurable kernel.
We denote by $L_2(\nu)$ the space of square-integrable functions on $\cX$ with respect to the measure $\nu$.
We define an integral operator $\cT_k: L_2(\nu) \rightarrow L_2(\nu)$ called the Mercer operator  by
$f \mapsto \int_{\cX} k(\cdot, x)f(x) d\nu(x)$.
Since $\cT_k$ is compact, positive, and self-adjoint, 
by the spectral theorem, 
there exists an orthonormal basis $\{\psi_i\}_{i \in I}$ of $L_2(\nu)$
such that for any $f \in L_2(\nu)$, 
$\cT_k$ has the following expansion 
$\cT_k f = \sum_{i \in I} \lambda_i \langle \psi_i, f \rangle_{L_2(\nu)}\psi_i$.
Here, $\{\lambda_i\}_{i \in I}$ is a set of non-zero eigenvalues of $\cT_k$ with $\lambda_1 \ge \lambda_2 \ge \cdots > 0$.
We refer to \cite[Theorem 4.49 and 4.51]{steinwart2008support} for the following form of Mercer's theorem.
\begin{thm}[Mercer's Theorem]
    \label{thm:mercer}
    Let $\{\psi_i\}_{i \in I}$ and $\{\lambda_i\}_{i \in I}$ be defined as above.
    Assume that $\cX$ is a compact metric space, $k: \cX \times \cX \rightarrow \RR$ is a continuous kernel, and $\nu$ is a finite Borel measure with $\supp \nu = \cX$.
    Then, we have the following expansion:
    \begin{equation*}
        k(x, x') = \sum_{i \in I} \lambda_i \psi_i(x)\psi_i(x'), \quad x, x' \in \cX.
    \end{equation*}
    Here, the convergence is absolute and uniform.
    Moreover, $\{\lambda_i^{1/2}\psi_i\}_{i \in I}$ forms an orthonormal basis of $\cH_k$.
    % $\cH_k$ has the following characterization as a subspace of $L_2(\lambda)$.
    % \begin{equation*}
    %     \cH_k = \left\{f = \sum_{i \in I} a_i \lambda_i ^{1/2}\psi_i: \|f\|_{\cH_k}^2 = \sum_{i \in I} a_i^2 < \infty\right\}.
    % \end{equation*}
    % That is, 
    % $\{\lambda_i^{1/2}\psi_i\}_{i \in I}$ forms an orthonormal basis of $\cH_k$.
\end{thm}

% We introduce a formal definition of polynomial and exponential eigendecay of the kernel $k$ 
% \citep[Definition 1]{vakili2021information}.
% \begin{dfn}[Polynomial and Exponential Eigendecay]
%     \label{def:eigendecay}
%     Let $\{\lambda_i\}_{i \in I}$ be the eigenvalues of the Mercer operator 
%     with $\lambda_1 \ge \lambda_2 \ge \cdots >0$ and $I \subseteq \ZZ_{\ge 1}$ as in Theorem \ref{thm:mercer}.
%     \begin{enumerate}
%         \item Let $\beta_p > 1$. We say $k$ has $\beta_p$ polynomial eigendecay 
%         if there exists a constant $C_p > 0$ such that $\lambda_n \le C_p n^{-\beta_p}$ for all $n \in I$.
%         \item Let $\beta_e > 0$. We say $k$ has $\beta_e$ exponential eigendecay if there exist constants
%         $C_{e, 1}, C_{e, 2} > 0$ such that $\lambda_n \le C_{e, 1} \exp(-C_{e, 2} n^{\beta_e})$ for all $n \in I$.
%     \end{enumerate}
% \end{dfn}

To discuss the theoretical property of our algorithm (\Cref{alg:qmc-kernel-ucb} in Sec.~\ref{sec:method}), we introduce the following formal characteristic of eigendecay as defined in \citet[Definition~11]{chatterji2019online} and \citet[Definition~1]{vakili2021information}:
\begin{dfn}[Eigen-decay]
    \label{def:eigendecay}
    Let $\{\lambda_i\}_{i \in I}$ be the eigenvalues of the Mercer operator 
    with $\lambda_1 \ge \lambda_2 \ge \cdots >0$ and $I \subseteq \ZZ_{\ge 1}$ as in Theorem \ref{thm:mercer}.
    \begin{enumerate}
        \item Let $C_p > 0$ and $\beta_p > 1$ be constants. 
        We say a kernel $k$ has a $(C_p, \beta_p)$ polynomial eigendecay, if for all $n\in I$, we have $\lambda_n \leq C_p n^{-\beta_p}$.
        \item Let $C_{e,1}\, C_{e,2} > 0$ and $\beta_e > 0$ be constants.
        We say a kernel $k$ has a $(C_{e,1}, C_{e,2}, \beta_{e})$ exponential eigendecay, if for all $n\in I$, we have $\lambda_n \leq C_{e, 1} \exp(-C_{e, 2} n^{\beta_{e}})$.
    \end{enumerate}
    If we ignore constants $C_p, C_{e, 1}, C_{e, 2}$, then we simply say $k$ has a $\beta_p$ polynomial eigendecay or $\beta_e$ exponential eigendecay.
\end{dfn}

We provide examples of eigendecay of kernels in the case when $\cX$ is a compact subset of $\RR^d$.
It is known that a Mat\'ern kernel with a smoothness parameter $\nu>0$ has $(2\nu + d)/d$ polynomial eigendecay 
\citep[Theorem 15]{santin2016approximation}.
If $k$ is an SE or RQ kernel, then $k$ has $1/d$ exponential eigendecay.
The latter statement follows from \citep[Theorem 15]{santin2016approximation} and \citep[Theorem 11.22]{wendland2004scattered}.