Learning biologically relevant features in a pathology foundation model using sparse autoencoders

Published: 12 Oct 2024, Last Modified: 15 Dec 2024AIM-FM Workshop @ NeurIPS'24 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sparse autoencoder, medical imaging, mechanistic interpretability, pathology
TL;DR: Sparse autoencoders identify monosemantic features from a pathology foundation model
Abstract: Pathology plays an important role in disease diagnosis, treatment decision-making and drug development. Previous works on interpretability for machine learning models on pathology images have revolved around methods such as attention value visualization and deriving human-interpretable features from model heatmaps. Mechanistic interpretability in an emerging area of model interpretability that focuses on reverse-engineering neural networks. Sparse Autoencoders (SAEs) have emerged as a promising direction in terms of extracting monosemantic features from model activations. In this work, we train a Sparse Autoencoder on the embeddings of a pathology pretrained foundation model. We discover an interpretable sparse representation of biological concepts within the model embedding space. We perform an investigation into how these representations are associated with quantitative human-interpretable features. Our work paves the way for further exploration around interpretable feature dimensions and their utility for medical and clinical applications.
Submission Number: 27
Loading