Towards Achieving Concept Completeness for Unsupervised Textual Concept Bottleneck Models

ACL ARR 2025 February Submission4603 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Textual Concept Bottleneck Models (TBMs) are interpretable-by-design models for text classification that predict a set of salient concepts before making the final prediction. This paper proposes Complete Textual Concept Bottleneck Model (CT-CBM), a novel approach generating concept labels in a fully unsupervised manner using a small language model, eliminating the need for predefined labeled concepts and the use of LLM for concept annotation. CT-CBM iteratively targets and adds important concepts in the bottleneck layer to create a nearly complete concept basis and addresses downstream classification leakage through a parallel residual connection. CT-CBM achieves good results against competitors, offering a promising solution to enhance interpretability of NLP classifiers without sacrificing performance.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability, Explainable AI, Concepts, Concepts Bottleneck Models
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 4603
Loading