(How) Can AI Bots Lie?Download PDF

Anonymous

Published: 24 May 2019, Last Modified: 05 May 2023XAIP 2019Readers: Everyone
Keywords: Explanations, Model Reconciliation, Lies
TL;DR: Model Reconciliation is an established framework for plan explanations, but can be easily hijacked to produce lies.
Abstract: Recent work on explanations for decision-making problems has viewed the explanation process as one of model reconciliation where an AI agent brings the human mental model (of its capabilities, beliefs, and goals) to the same page with regards to a task at hand. This formulation succinctly captures many possible types of explanations, as well as explicitly addresses the various properties – e.g. the social aspects, contrastiveness, and selectiveness – of explanations studied in social sciences among human-human interactions. However, it turns out that the same process can be hijacked into producing “alternative explanations” that are not true but still satisfy all these properties of a proper explanation. In AIES 2019, we discussed when such behavior may be appropriate but did not go into details of how exactly they can be generated. In this paper, we go into details of this curious feature of the model reconciliation process as a well-established framework for explanation generation of decision-making problems and formalize the relationship between explanations, lies, and persuasion in the model reconciliation framework.
Author Identity Visibility: Reveal author identities to reviewers
6 Replies

Loading