Keywords: Concept bottleneck models, Information bottleneck
TL;DR: Enhances Concept Bottleneck Models by integrating the Information Bottleneck principle to reduce concept leakage and improve performance
Abstract: Concept Bottleneck Models (CBMs) provide a self-explanatory framework by making predictions based on concepts that humans can understand. However, they often fall short in overall performance and interpretability because they tend to let irrelevant information seep into the concept activations. To tackle concept leakage, we introduce an information-theoretic framework to CBMs by incorporating the Information Bottleneck (IB) principle. Our method ensures that only pertinent information is retained in the concepts by limiting the mutual information between the input data and the concepts. This shift represents a new direction for CBMs, one that not only boosts concept prediction but also reinforces the link between latent representations and comprehensible concepts, leading to a model that is both more robust and more interpretable. Our findings show that our IB-based CBMs enhance the accuracy of concept prediction and diminish concept leakage without compromising the target prediction accuracy when compared to similar models. We also introduce an innovative metric designed to evaluate the quality of concept sets by focusing on performance following interventions. This metric stands in contrast to traditional task performance measures, which can sometimes conceal the impact of concept leakage, by providing a clear and interpretable means of assessing the effectiveness of concept sets.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6801
Loading