(How) Can AI Bots Lie?


Apr 16, 2019 ICAPS 2019 Workshop XAIP Blind Submission readers: everyone
  • Keywords: Explanations, Model Reconciliation, Lies
  • TL;DR: Model Reconciliation is an established framework for plan explanations, but can be easily hijacked to produce lies.
  • Abstract: Recent work on explanation generation for decision-making problems has viewed the explanation process as one of model reconciliation where an AI agent brings the human mental model (of its capabilities, beliefs, and goals) to the same page with regards to a task at hand. This formulation succinctly captures many possible types of explanations, as well as explicitly addresses the various properties -- e.g. the social aspects, contrastiveness, and selectiveness -- of explanations studied in social sciences among human-human interactions. However, it turns out that the same process can be hijacked into producing "alternative explanations" -- i.e. explanations that are not true but still satisfy all the properties of a proper explanation. In previous work, we have looked at how such explanations may be perceived by the human in the loop and alluded to one possible way of generating them. In this paper, we go into more details of this curious feature of the model reconciliation process and discuss similar implications to the overall notion of explainable decision-making.
  • Author Identity Visibility: Reveal author identities to reviewers
0 Replies