MechSci: Scaling Clinical Science via Mechanistic Interpretability of Multimodal Medical Foundation Models

Robbie Holland; Ashwin Kumar; Eduardo Pontes Reis; Akshay S Chaudhari; Sergios Gatidis

MechSci: Scaling Clinical Science via Mechanistic Interpretability of Multimodal Medical Foundation Models

Robbie Holland, Ashwin Kumar, Eduardo Pontes Reis, Akshay S Chaudhari, Sergios Gatidis

Published: 08 Oct 2025, Last Modified: 20 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mechanistic interpretability, sparse autoencoders, medical foundation models, hypothesis generation, clinical discovery, imaging biomarkers

TL;DR: Using mechanistic interpretability we generate thousands of highly prognostic, testable clinical hypotheses (as manuscripts like the one submitted) by mapping foundation model embeddings to human-readable features with LLMs, scaling clinical science.

Abstract: Large, multimodal medical datasets harbor complex, latent structures that hold immense potential for scientific discovery. While foundation models excel at extracting predictive signals from such data, their inherent opacity limits their use as tools for generating new scientific knowledge. This work introduces a fully automated pipeline to uncover novel scientific knowledge by transforming the latent representations of medical foundation models into sparse, human-interpretable concepts. We employ Matryoshka Top-K Sparse Autoencoders (SAEs) to decompose dense feature vectors from a 3D CT imaging foundation model into a sparse basis of learned concepts. An automated interpretation module then uses a large language model to assign a semantic, clinical description to each discovered concept. Finally, the system systematically evaluates each concept for its prognostic value across a range of clinical outcomes, generating testable hypotheses. This entire process, from concept discovery to the generation of this manuscript, is automated. As a proof-of-concept, we present a detailed analysis of one such automatically generated hypothesis: a novel imaging biomarker, image_Concept_66, which LLMs concluded to represent "abnormal soft tissue density and stranding in abdominal fat." This feature is shown to be a strong predictor for the future onset of skin cancer (cancer_skin), with an odds ratio of 3.7 (p < 0.001), significantly outperforming clinical risk factors such as patient age, sex, race and BMI and smoking status. This work demonstrates a scalable, end-to-end system that transforms AI from a predictive tool into an engine for generating interpretable and clinically valuable scientific hypotheses.

Supplementary Material: pdf

Submission Number: 278

Loading