Understanding Task Representations in Neural Networks via Bayesian Ablation

Published: 10 Mar 2026, Last Modified: 07 Apr 2026CLeaR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation learning, interpretability, neural networks, Bayesian inference
TL;DR: We introduce a Ablation Mask Distributions, a probabilistic framework for interpreting latent task representations in neural networks.
Abstract: Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. We introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.
Pmlr Agreement: pdf
Submission Number: 11
Loading