Provenance Networks: End-to-End Exemplar-Based Explainability

Provenance Networks: End-to-End Exemplar-Based Explainability

ICLR 2026 Conference Submission20199 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainability, neural networks, deep learning, interpretability, provenance networks, attribution, robustness, hallucination, image generation

TL;DR: We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability.

Abstract: We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model’s normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach enables systematic studies of memorization versus generalization, facilitates the identification of mislabeled or anomalous data points, and enhances robustness to input perturbations. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency and trustworthiness in neural models.

Primary Area: interpretability and explainable AI

Submission Number: 20199

Loading