Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

Wenxin Tai; Ting Zhong; Goce Trajcevski; Fan Zhou

Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

Wenxin Tai, Ting Zhong, Goce Trajcevski, Fan Zhou

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self-interpretble GNNs; Trustworthy; Consistency; Faithfulness

TL;DR: We identify the mismatch between SI-GNN training and faithfulness evaluation, show its connection to self-consistency, and propose a simple SC loss that consistently improves explanation quality without architectural changes.

Abstract: Graph Neural Networks (GNNs) achieve strong predictive performance but offer limited transparency in their decision-making. Self-Interpretable GNNs (SI-GNNs) address this by generating built-in explanations, yet their training objectives are misaligned with evaluation criteria such as faithfulness. This raises two key questions: (i) can faithfulness be explicitly optimized during training, and (ii) does such optimization truly improve explanation quality? We show that faithfulness is intrinsically tied to explanation self-consistency and can therefore be optimized directly. Empirical analysis further reveals that self-inconsistency predominantly occurs on unimportant features, linking it to redundancy-driven explanation inconsistency observed in recent work and suggesting untapped potential for improving explanation quality. Building on these insights, we introduce a simple, model-agnostic self-consistency (SC) fine-tuning strategy. Without changing model architectures, SC consistently improves explanation quality across multiple dimensions and benchmarks, offering an effective and scalable pathway to more trustworthy GNN explanations. Our code is publicly available at \url{https://github.com/ICDM-UESTC/SelfConsistencyXGNN}.

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 10786

Loading