11 Summaries of Papers on Explainable Reinforcement Learning With Some Commentary

Anonymous

17 Jan 2022 (modified: 05 May 2023)Submitted to BT@ICLR2022Readers: Everyone
Keywords: interpretability, explainability, transparency, reinforcement learning
Abstract: Model interpretability was a bullet point in Concrete Problems in AI Safety (2016). Since then, interpretability has come to comprise entire research directions in technical safety agendas (2020). It is safe to say that interpretability is now a very popular area of research. In fact, the topic is sufficiently mainstream that there are books on the topic and corporate services promising to provide it. Interpretability for reinforcement learning, however, has received much less attention than for supervised learning. So what's the state of research on this topic? What does progress in interpretable RL look like, and are we making progress? What is this post? This post summarizes 11 recent papers on explaining reinforcement learning agents (in ICLR and related conferences), then provides commentary on the research. The summaries - and not the commentary - are the main point of this post. Though people like paper summaries, this is the kind of interpretive labor that isn't traditionally awarded space in research venues. We primarily select papers appearing between 2018 and 2020, in order to bridge the gap between foundational papers published in 2010-2017 and the more recent and diverse directions of research in the field.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://openreview.net/forum?id=rylvYaNYDH, https://openreview.net/forum?id=H1xFWgrFPS, https://openreview.net/forum?id=rkl3m1BFDB
2 Replies

Loading