Path-Integrated Loss-Gradient Kernels: Auditing and Similarity for Trained Neural Networks

27 Apr 2026 (modified: 10 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite their success, deep neural networks remain opaque: it is often unclear why a model fails on a particular input, and classical generalization theory offers limited guidance in the overparameterized regime. Gradient-descent training naturally gives rise to path-dependent inner products between data points, but the resulting kernel matrices are asymmetric and can have negative eigenvalues, precluding their use as proper kernels or similarity measures. We show that a simple modification -- replacing output gradients with loss gradients in these inner products -- restores symmetry and positive semi-definiteness, yielding a Mercer kernel (the path-integrated loss-gradient kernel, PLGK). In particular, the PLGK yields (i) a fine-grained auditing decomposition that attributes how individual predictions arise from training data, and (ii) an intrinsic, behavior-based similarity measure between inputs. We validate both tools in focused experiments, including pruning studies that confirm audit-identified influences predict retraining outcomes, and a capstone analysis demonstrating that adversarial perturbations exploit a cancellation among training influences that prevents the network from learning on adversarial inputs, which can be broken by a simple mode-aware perturbation to largely restore performance.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Sebastian_Goldt1
Submission Number: 8644
Loading