Design for Interpretability

Anonymous

Design for Interpretability

Anonymous

Published: 24 May 2019, Last Modified: 05 May 2023XAIP 2019Readers: Everyone

Keywords: explicability, legibility, predictability, environment design

TL;DR: We present an approach to redesign the environment such that uninterpretable agent behaviors are minimized or eliminated.

Abstract: The interpretability of an AI agent's behavior is of utmost importance for effective human-AI interaction. To this end, there has been increasing interest in characterizing and generating interpretable behavior of the agent. An alternative approach to guarantee that the agent generates interpretable behavior would be to design the agent's environment such that uninterpretable behaviors are either prohibitively expensive or unavailable to the agent. To date, there has been work under the umbrella of goal or plan recognition design exploring this notion of environment redesign in some specific instances of interpretable of behavior. In this position paper, we scope the landscape of interpretable behavior and environment redesign in all its different flavors. Specifically, we focus on three specific types of interpretable behaviors -- explicability, legibility, and predictability -- and present a general framework for the problem of environment design that can be instantiated to achieve each of the three interpretable behaviors. We also discuss how specific instantiations of this framework correspond to prior works on environment design and identify exciting opportunities for future work.

Author Identity Visibility: Reveal author identities to reviewers

7 Replies

Loading