Intentional policy graphs: A pipeline for explaining agent behavior through intentions

Victor Gimenez-Abalos, Sergio Alvarez-Napagao, Adrián Tormos, Sara Montese, Ulises Cortés, Javier Vazquez-Salceda

Published: 09 Apr 2026, Last Modified: 06 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Artificial intelligence (AI) agents increasingly operate autonomously in complex environments. While these systems can exhibit highly effective and adaptive behavior, they are often opaque, making it difficult for users, developers, and regulators to understand why agents act as they do. This lack of understanding undermines trust, accountability, and the safe deployment of AI systems. This work contributes to the broader goal of trustworthy AI by introducing a methodology to explain agent behavior in terms that align with how humans naturally reason about actions: through intentions and goals. Rather than focusing only on low-level correlations between states and actions, we model behavior in terms of what an agent is trying to achieve and how strongly it is committed to those objectives. This allows explanations that answer intuitive questions such as what the agent wants to do, how it plans to do it, and why a particular action makes sense in context. By building these explanations from partial observations and without access to the agent’s internal model, the approach is applicable to opaque or proprietary systems. Furthermore, the proposed metrics make it possible to reason explicitly about the trade-off between interpretability and reliability of explanations. Together, these contributions support better auditing, tracing, observing, debugging, and monitoring of autonomous agents, enabling human understanding of these systems.