Revisiting Bisimulation: A Sampling-Based State Similarity Pseudo-metricDownload PDF

01 Mar 2023 (modified: 01 Jun 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: representation learning, reinforcement learning, state similarity metric
TL;DR: we present a novel sampling-based state similarity pseudo metric for MDPs that enjoys interesting theoretical properties, which we also illustrate empirically
Abstract: In reinforcement learning (RL), we typically deal with systems with large or continuous states encoded in an unstructured way. Because it is not possible to represent the value of each state, it is necessary to learn a structured representation from limited state samples to express the value function in a more meaningful way. One approach to do so is to endow the set of states with a behavioral metric, such that two states that are close in the metric space are also close in the space of value functions. While there exists some notions of state similarity, they are either not amenable to sample-based algorithms \citep{ferns2004metrics, ferns05metrics}, need additional assumptions \citep{castro2020scalable, zhang2020learning, agarwal2021pse} or yield limited theoretical guarantees \citep{castro2021mico}. In this paper, we present a new behavioural pseudo-metric, PMiCo, to overcome these shortcomings. PMiCo is based on a recent sampling-based behavioural distance, MICo \citep[Matching under Independent Couplings;][]{castro2021mico}, but enjoys more interesting theoretical properties, which we also illustrate empirically.
8 Replies

Loading