Defining Deception in Decision Making

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: deception, AI safety
TL;DR: a definition for deception under the formalism of decision making
Abstract: With the growing capabilities of machine learning systems, particularly those that interact with humans, there is an increased risk of systems that can easily deceive and manipulate people. Preventing unintended behaviors therefore represents an important challenge for creating aligned AI systems. To approach this challenge in a principled way, we first need to define deception formally. In this work, we present a concrete definition of deception under the formalism of rational decision making in partially observed Markov decision processes. Specifically, we propose a general regret theory of deception under which the degree of deception can be quantified in terms of the actor's beliefs, actions, and utility. To evaluate our definition, we study the degree to which our definition aligns with human judgments about deception. We hope that our work will constitute a step toward both systems that aim to avoid deception, and detection mechanisms to identify deceptive agents.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14282
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview