Can Vision-Language Models Enable More Efficient Concept-Based Learning with Less Supervision for Interpretable Lung Nodule Diagnosis?

15 Apr 2026 (modified: 16 Apr 2026)MIDL 2026 Short Papers SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Concept learning, interpretability, lung nodule diagnosis, Vision-Language model
Registration Requirement: Yes
Abstract: Interpretability is necessary for the safe deployment of AI systems in clinical practice, especially in tasks such as the diagnosis of lung nodules. Concept Bottleneck Model (CBMs) provide a promising framework for interpretable predictions by linking decisions to clinically meaningful concepts. However, standard CBMs rely on extensive and time-consuming concept annotations. Recent methods aimed to fill this gap by leveraging vision-language models (VLMs) for few-shot or even label-free concept learning. However, it still remains unclear whether the prior knowledge within VLMs is sufficient for fine-grained, nodule-level concept detection. In this work, we comprehensively investigate how much supervision is essential for reliable concept-based diagnosis, and whether VLMs can improve efficiency. We compare black-box models, standard CBMs, few-shot VLM-based CBMs, and labelfree CBMs on CT-based lung nodule diagnosis. The results show that few-shot VLM-based CBMs achieve improved concept detection (Balanced accuracy (Bacc): 0.78 vs. 0.76, F1 score: 0.76 vs. 0.72) and diagnostic performance (Bacc: 0.72 vs. 0.52, 0.74 vs. 0.36) compared to standard CBMs, and can even outperform black-box models in F1 score (0.74 vs. 0.66). In contrast, label-free CBMs produce unreliable meaningless concept representations. These results suggest that VLMs can reduce supervision and improve interpretability and diagnostic performance, but are not yet sufficient for fully label-free concept-based learning.
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 95
Loading