Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
TL;DR: A first step towards the use of dictionary learning (a popular tool from mechanistic interpretability) for the scientific discovery of biological concepts, using microscopy foundation models
Abstract: Sparse dictionary learning (DL) has emerged as a powerful approach to extract semantically meaningful concepts from the internals of large language models (LLMs) trained mainly in the text domain. In this work, we explore whether DL can extract meaningful concepts from less human-interpretable scientific data, such as vision foundation models trained on cell microscopy images, where limited prior knowledge exists about which high-level concepts should arise. We propose a novel combination of a sparse DL algorithm, Iterative Codebook Feature Learning (ICFL), with a PCA whitening pre-processing step derived from control data. Using this combined approach, we successfully retrieve biologically meaningful concepts, such as cell types and genetic perturbations. Moreover, we demonstrate how our method reveals subtle morphological changes arising from human-interpretable interventions, offering a promising new direction for scientific discovery via mechanistic interpretability in bioimaging.
Lay Summary: Researchers in machine learning are increasingly interested in understanding how complex models process information internally: a field known as mechanistic interpretability. This area focuses on uncovering how models compute their outputs, rather than evaluating how well those outputs align with human intuition. One promising approach from this field, called sparse dictionary learning, has shown success in analyzing language models by identifying components inside the model that correspond to distinct patterns in language. In this work, we explore whether similar techniques can be used to study models trained not on text, but on scientific data such as microscopy images of cells. These models, known as vision foundation models, are trained to capture rich visual features but are much harder to interpret. We introduce a method that combines a sparse learning algorithm with a data-driven pre-processing step to help identify meaningful biological concepts. This approach enables us to extract meaningful biological patterns, such as differences between cell types and the effects of genetic perturbations. This approach reveals not only interpretable internal features, but also subtle morphological changes in cells, suggesting new avenues for using machine learning and mechanistic interpretability to advance scientific discovery in bioimage data analysis.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Deep Learning->Foundation Models
Keywords: mechanistic interpretability, scientific discovery, masked auto-encoders
Submission Number: 12649
Loading