Trustworthiness and co-cognition in artificial intelligence systems

Published: 04 Jun 2026, Last Modified: 11 Jun 2026PhilML@ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal reasoning, Causal alignment, Trustworthiness in AI, Interpretability
Abstract: Trustworthiness in artificial intelligence requires more than performance on a fixed task distribution: a system may satisfy a formal objective while failing to track what matters to users when conditions change. We argue that trustworthiness requires \emph{co-cognition} - the behavior of a system which consists in reacting specifically to the same objects that structure the user's experience, where an object is any region of experience an agent individuates and a reaction is \emph{specific} when the object's state change is its minimal cause relative to the observer's causal model. We formalize co-cognition through an object correspondence and minimality condition, and identify stability, generalization, and robustness as its operational signatures. We further argue that co-cognition is a common cause of trustworthiness and consciousness attribution, explaining why they are correlated: trust arises through an active construction process in which specific reactions lead the observer to attribute a shared point of view. We show that current ML systems systematically fail to achieve co-cognition due to correlational learning, and diagnose distribution shift, reward hacking, adversarial vulnerability, and hallucination as distinct co-cognition failures. This provides a principled account of trustworthiness grounded in causal alignment rather than performance alone.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 7
Loading