\section{Simulations}\label{sec:sim}
We compare the performance of \algo~(Algorithm~\ref{algo:zorl})~with that of UCRL2~\citep{jaksch2010near}, TSDE~\citep{ouyang2017learning}, RVI-Q~\citep{borkar2000ode} which is a Q-learning algorithm for average-reward RL, ZoRL-$\eps$~\citep{kar2024adaptive}, and the heuristic algorithm PZRL-H~\citep{kar2024policy}.~For competitor policies that are designed for finite state-action spaces, we apply them on a uniform discretization of $\cS \times \cA$ performed at time $t=0$.~Simulation experiments are conducted on the following systems:~(i) \texttt{Continuous RiverSwim}, where the environment models an agent who is swimming in a river. (ii) Linear Quadratic~(LQ) control systems~\citep{abbasi2011regret} where the state evolves as $s_{t+1} = A s_t + B a_t + w_t$, and we truncate the state-action space in order to ensure that they are compact.~Denote the two systems of dimension $2 \times 2$ and $2 \times 4$ as \texttt{Truncated LQ-$1$} and \texttt{Truncated LQ-$2$}, respectively. (iii) \texttt{Non-linear System} where the state evolves as $s_{t+1} = A f(s_t) + B g(a_t) + w_t$, where $f$ and $g$ are non-linear functions.~Similar to the truncated LQ systems, we truncate the state-action space.~Details of the environments can be found in~\citet{kar2024adaptive}, and also in Appendix~\ref{app:sim}.~We plot the cumulative rewards averaged over $50$ runs in Figure~\ref{fig:perf}.~\algo~performs the best among all six algorithms on each of the environments.~Very recently, \citet{kar2024policy} has replaced PZRL-H with two algorithms, PZRL-MB and PZRL-MF.~In Appendix~\ref{app:sim} we compare their performance with~\algo.

\begin{figure}[ht]
    \centering
    \begin{subfigure}[b]{0.49\linewidth}
        \centering
        \includegraphics[width=\textwidth]{figures/RiverSwim_cumul_reward.pdf} 
        \caption{Continuous RiverSwim}
        \label{fig:rm_rew}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\linewidth}
        \centering
        \includegraphics[width=\textwidth]{figures/LinSys2x2_cumul_reward.pdf} 
        \caption{Truncated LQ-$1$}
        \label{fig:ls1_rew}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\linewidth}
        \centering
        \includegraphics[width=\textwidth]{figures/LinSys2x4_cumul_reward.pdf}
         \caption{Truncated LQ-$2$}
        \label{fig:ls2_rew}
    \end{subfigure}
    \begin{subfigure}[b]{0.49\linewidth}
        \centering
        \includegraphics[width=\textwidth]{figures/NonLinSys_cumul_reward.pdf}
         \caption{Non-linear System}
        \label{fig:nls_rew}
    \end{subfigure}
    \caption{Cumulative Reward Plots.}
    \label{fig:perf}
\end{figure}
\vspace{-5pt}