Keywords: interpretability, lmm, vlm, cav, probes, concept, explainable ai, xai, multimodal, llm, vision
TL;DR: An automatic method for identifying important visual concepts used by large multi-modal models
Abstract: Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. While traditional audits range from costly clinical trials to automatic benchmark evaluations, recent advances in automatic interpretability use AI systems to explain other AI models at scale. We introduce an algorithm for identifying salient visual concepts within large multimodal models (LMMs) and demonstrate that leveraging model internals yields more causally relevant insights than black-box approaches. Applying our method to two medical tasks (skin lesion classification and chest radiograph interpretation), we both uncover verifiable conceptual dependencies of LMMs and identify ways in which automatic concept labels may be misleading, highlighting both the promise of automatic interpretability for auditing and the continued importance of expert-in-the-loop oversight.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 10314
Loading