IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck

27 Sept 2024 (modified: 30 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Bottleneck, Circuit Analysis
Abstract: Circuit discovery has recently attracted attention as a potential research direction to explain the nontrivial behaviors of language model (LM). It aims to find the computational subgraph, also known as \emph{circuit}, that explains LM's behavior on specific tasks. Most studies determine the circuit for a task by performing causal interventions independently on each component. However, they ignored the holistic nature of the circuit, which is an interconnected system of components rather than an independent combination. Additionally, existing methods require redesigning a unique corrupted activation for each task, which are complicated and inefficient. In this work, we propose a novel circuit discovery approach based on the principle of Information Bottleneck, called IBCircuit, to identify the most informative circuit from a holistic perspective. Furthermore, IBcircuit can be applied to any given task without corrupted activation construction. Our experiments demonstrate the ability of IBCircuit to identify the most informative circuit in the model. The results from IBCircuit suggest that the earlier layers in Transformer-based models are crucial in capturing factual information.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9201
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview