\section{Conclusion}\label{sec:conc}
We propose a computationally efficient algorithm for average reward RL for Lipschitz MDPs in continuous spaces, and show that it is truly adaptive, i.e. it achieves a regret of $\ctO\big(T^{1 - \deff\inv}\big)$, where $\deff = 2 d_\cS + d_z + 3$.~The zooming dimension $d_z$ is a problem-dependent quantity, measures the size of near-optimal state-action pairs and is bounded above by $d$, the dimension of the state-action space.~Simulation experiments support the theoretical findings.~\algo~overperforms the popular fixed discretization-based algorithms as well as adaptive discretization-based algorithms.