Recently, \cite{dai2023quantum} extended the linear reward model \citep{wan2023quantum} 
to the kernelized case, which is the same problem setting as this paper.
Compared to \citep{dai2023quantum}, this paper has the following advantages.
% \textbf{(i) Corrected upper bounds}. 
\strevision{\textbf{(i) Theoretical analysis without the unbiasedness assumption of the QMC estimator}. }
\cite{dai2023quantum} provided regret upper bounds (e.g., $\widetilde{O}(\log^{3(d+1)/2}(T))$ in the case of squared exponential kernels),
 however, their proof implicitly assumes that 
the quantum Monte Carlo method \citep{montanaro2015quantum} is an unbiased estimator.
% As we will detail in Sec. \ref{subsec:method-comparison-to-qbo}, this assumption is unlikely to hold.
As we will detail in Sec. \ref{subsec:method-comparison-to-qbo}, this assumption is unlikely to hold.
Our regret bounds do not rely on the unbiasedness assumption.
Thus we provide more mathematically rigorous analysis compared to \citep{dai2023quantum}.
\strevision{\textbf{(ii) Improved regret bounds}.
Even under the unbiasedness assumption of the QMC estimator, 
our regret bound $\widetilde{O}\left( T^{\frac{3}{1 + \beta_p}} \log\left(\frac{1}{\delta} \right)\right)$ is better 
than that $\widetilde{O}\left( T^{\frac{3}{\beta_p}} \log\left(\frac{1}{\delta} \right)\right)$  of Q-GP-UCB \citep{dai2023quantum}, in the case of the $\beta_p$-polynomial eigendecay.}
\textbf{(iii) A novel tradeoff parameter $\eta$}. 
Both our algorithm (Algorithm \ref{alg:qmc-kernel-ucb}) and Q-GP-UCB \citep{dai2023quantum} 
extend QLinUCB \citep{wan2023quantum} to the kernelized case and these algorithms divide 
the time interval into several stages.
There is a tradeoff between the total number of stages and regret incurred in each stage.
We not only extend QLinUCB to the kernelized case, but also introduce a novel tradeoff parameter $\eta$,
which is a key feature that leads to the aforementioned improved regret bounds.
% (see also Proposition \ref{prop:totnst-ub} and a remark after it).
\textbf{(iv) A novel proof technique for bounding the (weighted) information gain}.
Both this paper and \cite{dai2023quantum} provide an upper bound of the ``weighted information gain'' $\qinfgain$
(see \eqref{eq:gamma-def} for definition and Corollary \ref{cor:gamma-bound}),
which is an analogue of \citep[Theorem 3]{vakili2021information}.
While \cite{dai2023quantum} almost repeated the proof of \citep[Theorem 3]{vakili2021information},
we provide a more generalized result (Proposition \ref{prop:log-det-ineq}) including these results.
In particular, our proof provides a simple alternative proof of \citep[Theorem 3]{vakili2021information}.
