Abstract: Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of
AI systems. We focus on the problem that agents might deceive in order to achieve
their goals. There are a number of existing definitions of deception in the literature
on game theory and symbolic AI, but there is no overarching theory of deception
for learning agents in games. We introduce a functional definition of deception
in structural causal games, grounded in the philosophical literature. We present
several examples to establish that our formal definition captures philosophical and
commonsense desiderata for deception.
1 Reply
Loading