Concept-Guided Dictionary Learning for Interpretable Concept Extraction and Attribution in Large Vision–Language Models
Keywords: Vision–Language Models, Large Multimodal Models, Concept-Guided Feature Extraction, Feature Attribution, Interpretability, Explainable AI (XAI), Concept Attribution, Representation Learning, Cross-Modal Understanding, Model Transparency
TL;DR: We propose a method to extract and quantify monosemantic concept in large vision-language models (LVLMs), enabling better interpretability of their internal representations.
Abstract: Autoregressive Large Vision-Language Models (LVLMs) generate text sequentially, conditioning each token on evolving multimodal states. This makes it difficult to assess whether predictions are grounded in \textbf{visual concepts} or instead reflect hallucination or bias. Existing concept-discovery approaches such as \textbf{TCAV}, \textbf{CRAFT}, and \textbf{CLIP-Dissect} are designed for encoder-only or contrastive models. At the same time, recent LVLM methods (CoX-LMM) depend on labeled concepts and simplified settings, limiting scalability.
We propose \textbf{Concept-Guided Dictionary Learning (CGDL)}, an sem-supervised and scalable framework for multimodal concept discovery in autoregressive LVLMs. CGDL first probes the model to surface textual concepts from a dataset. For each concept, it constructs positive and negative patch sets using concept-grounded crops and randomized backgrounds. A contrastive dictionary-learning stage then disentangles concept-aligned activations from residual noise, yielding sparse, monosemantic vectors that reveal \textbf{semantically aligned visual–textual interactions} and enable faithful attribution of predictions to visual evidence.
On \textbf{ImageNet-1k, MSCOCO}, CGDL outperforms recent interpretability methods with up to \textbf{4\% higher sparsity}, \textbf{11\% greater stability}, \textbf{17\% lower overlap}, and strong attribution faithfulness, while scaling efficiently to large concept vocabularies. These results advance concept-based interpretability for LVLMs and provide a practical step toward transparent multimodal reasoning.
Supplementary Material: pdf
Primary Area: interpretability and explainable AI
Submission Number: 5603
Loading