Keywords: Explainability, neural networks, deep learning, interpretability, provenance networks, attribution, robustness, hallucination, image generation
TL;DR: We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability.
Abstract: We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model’s normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach enables systematic studies of memorization versus generalization, facilitates the identification of mislabeled or anomalous data points, and enhances robustness to input perturbations. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency and trustworthiness in neural models.
Primary Area: interpretability and explainable AI
Submission Number: 20199
Loading