In this work, we studied low-rank MDPs characterized by the instance-dependent properties \(\Delta_{\textnormal{min}}\) (minimal sub-optimality gap) and \(d_{\textnormal{min}}^{\star}\) (minimal optimal occupancy). We proposed to extend the existing \textsc{REP-UCB} algorithm with a double exploration strategy and a constrained optimization objective, and showed that this novel algorithm can leverage good representations for more efficient exploration. Additionally, we demonstrated that our algorithm enjoys constant regret in low-rank MDPs and provided a condition that is sufficient and necessary for the existence of good representations.

An interesting direction for future work is the design of computationally efficient variants of our proposed algorithms and to test them on deep RL benchmarks. 
%Furthermore, we would like to assert whether constant regret is achievable without knowledge of the minimal sub-optimality gap and the minimal optimal occupancy. 
Furthermore, it would be interesting to understand whether UniSOFT features are necessary for instance-dependent regret.