Strategic Deception in Deterministic Markov Decision Processes via Value Differences

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: deterministic Markov decision processes, deception
Abstract: We investigate the design of autonomous deceptive agents capable of deceiving observers while executing tasks in deterministic and complex environments. Recent research has introduced an intent recognition model based on Q-differences and the ambiguity model (AM), which selects actions ambiguous over reward functions using pre-trained Q-functions to mislead observers. However, we identify that AM fails to achieve effective deception in deterministic Markov decision processes (DMDPs) because the strategy of maximizing entropy at each step leads to a large number of ineffective deceptive behaviors in the later stages of the task when the intention has been revealed. To address this problem, using the existing intent recognition based on state value differences (V-differences), we propose the concept of the last deceptive state (LDS), a method to compute the optimal LDS, and two \textit{V}-differences-based Deceptive Models (VDMs). VDMs plan deceptive trajectories in DMDPs, moving beyond the geometric constraints of traditional path planning. Experiments in path planning domains demonstrate that VDMs achieve stronger deception and outperform AM across key metrics, including trajectory cost, deceptiveness, and steps after LDS.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 11479
Loading