MICo: Improved representations via sampling-based state similarity for Markov decision processes

Pablo Samuel Castro; Tyler Kastner; Prakash Panangaden; Mark Rowland

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning, Metrics, Deep Reinforcement Learning

TL;DR: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.

Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analyses, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/google-research/google-research/tree/master/mico

21 Replies

Loading