Automatic visual concept rankings for large multimodal models

Joseph David Janizek; Sonnet Xu; Junayd Lateef; Roxana Daneshjou

Automatic visual concept rankings for large multimodal models

Joseph David Janizek, Sonnet Xu, Junayd Lateef, Roxana Daneshjou

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: interpretability, lmm, vlm, cav, probes, concept, explainable ai, xai, multimodal, llm, vision

TL;DR: An automatic method for identifying important visual concepts used by large multi-modal models

Abstract: Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. While traditional audits range from costly clinical trials to automatic benchmark evaluations, recent advances in automatic interpretability use AI systems to explain other AI models at scale. We introduce an algorithm for identifying salient visual concepts within large multimodal models (LMMs) and demonstrate that leveraging model internals yields more causally relevant insights than black-box approaches. Applying our method to two medical tasks (skin lesion classification and chest radiograph interpretation), we both uncover verifiable conceptual dependencies of LMMs and identify ways in which automatic concept labels may be misleading, highlighting both the promise of automatic interpretability for auditing and the continued importance of expert-in-the-loop oversight.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 10314

Loading