Keywords: Bayesian, Bayes, Complexity, Interpretability, Influence Functions, Singular Learning Theory, Data Attribution
TL;DR: We introduce a lightweight Bayesian toolkit to analyze trained neural networks by sampling from the local posterior with SGMCMC.
Abstract: We study a lightweight Bayesian probe for analyzing neural networks trained with standard optimization methods (e.g. SGD). Starting from trained parameters, we run stochastic-gradient Markov chain Monte Carlo (SGMCMC) to explore the local posterior, analyzing the per-sample losses as random quantities. The posterior mean loss defines the \textit{posterior loss gain}, a practical measure of sample difficulty. High loss-gain values capture difficult, atypical, or memorized samples, while lower values indicate easier, typical examples. The posterior covariance between samples defines the \textit{posterior loss covariance kernel}, reflecting shared structures learned by the network. Experiments on MNIST show that the posterior loss gain effectively separate easy digits from hard or mislabeled samples. On ImageNet, initial explorations with the posterior loss covariance kernel show examples of correlated images that suggest semantically coherent groupings and potential cross-class relationships. Together, the posterior loss gain and loss kernel offer a simple, post-training toolkit for investigating sample difficulty and semantic structure in deep neural networks.
Student Paper: Yes
Submission Number: 30
Loading