Track: Research Track
Keywords: Abstractions, Bisimulation, Model Irrelevance, Regret Bound
Abstract: State abstraction is a key tool for scaling reinforcement learning (RL) by reducing the complexity of the underlying Markov Decision Process (MDP). Among abstraction methods, bisimulation has emerged as a principled metric-based approach, yet its regret properties remain less understood compared to model irrelevance abstractions. In this work, we clarify the relationship between these two abstraction families: while model irrelevance implies bisimulation, the converse does not hold, leading to coarser abstractions under bisimulation. We provide the first regret bounds for policies derived from approximate bisimulation abstractions, analyzing both naive and smart refinement strategies for lifting abstract policies back to the original MDP. Our theoretical results show that smart refinement enjoys strictly better regret guarantees, and our experiments on Garnet MDPs confirm that this advantage translates into significant performance improvements. We further explain this gap through the action gap phenomenon in RL, which helps account for why some refinement strategies yield substantially better behavior in practice.
Submission Number: 140
Loading