Dynamic Granularity in the Wild: Differentiable Sheaf Discovery with Joint Computation Graph Pruning

ACL ARR 2025 May Submission183 Authors

08 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we introduce DiscoGP, a novel framework for extracting self-contained modular units, or *sheaves*, within LMs. These sheaves correspond to the models' impressive zero-shot performance across a variety of tasks. Our DiscoGP framework extends the concept of functional *circuits*, widely explored in interpretability research, by introducing *sheaves* --- subsets of connection edges and weight parameters in an LM's computation graph --- as interpretation units. Our framework identifies these sheaves through a differentiable pruning algorithm that operates on both the computation graph's edge connections and the model's weight parameters. This process reduces the LM to a sparse skeleton while preserving its core capabilities. Experimental results demonstrate that across a range of linguistic and reasoning tasks, DiscoGP extracts sheaves that preserve 93-100% of the model's task performance while comprising only 1-7% of the original weights and connections. Furthermore, our analysis reveals that, compared to previously identified LM circuits, the sheaves discovered by DiscoGP exhibit superior modularity and functional fidelity. Extending our method to the neuron level also unveiled novel insights into the inner workings of LLMs.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Circuit Discovery, Sheaf Discovery, Weight Pruning, Edge Pruning
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 183
Loading