Learning biologically relevant features in a pathology foundation model using sparse autoencoders

Nhat Minh Le; Neel Patel; Ciyue Shen; Blake Martin; Alfred Eng; Chintan Shah; Sean Grullon; Dinkar Juyal

Learning biologically relevant features in a pathology foundation model using sparse autoencoders

Nhat Minh Le, Neel Patel, Ciyue Shen, Blake Martin, Alfred Eng, Chintan Shah, Sean Grullon, Dinkar Juyal

Published: 12 Oct 2024, Last Modified: 15 Dec 2024AIM-FM Workshop @ NeurIPS'24 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sparse autoencoder, medical imaging, mechanistic interpretability, pathology

TL;DR: Sparse autoencoders identify monosemantic features from a pathology foundation model

Abstract: Pathology plays an important role in disease diagnosis, treatment decision-making and drug development. Previous works on interpretability for machine learning models on pathology images have revolved around methods such as attention value visualization and deriving human-interpretable features from model heatmaps. Mechanistic interpretability in an emerging area of model interpretability that focuses on reverse-engineering neural networks. Sparse Autoencoders (SAEs) have emerged as a promising direction in terms of extracting monosemantic features from model activations. In this work, we train a Sparse Autoencoder on the embeddings of a pathology pretrained foundation model. We discover an interpretable sparse representation of biological concepts within the model embedding space. We perform an investigation into how these representations are associated with quantitative human-interpretable features. Our work paves the way for further exploration around interpretable feature dimensions and their utility for medical and clinical applications.

Submission Number: 27

Loading