ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Kian Kenyon-Dean; Zitong Jerry Wang; John Urbanik; Konstantin Donhauser; Jason Hartford; Saber Saberian; Nil Sahin; Ihab Bendidi; Safiye Celik; Juan Sebastián Rodríguez Vera; Marta Fay; Imran S Haque; Oren Kraus

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Juan Sebastián Rodríguez Vera, Marta Fay, Imran S Haque, Oren Kraus

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: We develop new methods to train scientific foundation models and evaluate a new 1.9B parameter MAE-G/8 (and others) along a large set of novel benchmarking tasks, demonstrating effective performance for many drug discovery usecases.

Abstract: Deriving insights from experimentally generated datasets requires methods that can account for random and systematic measurement errors and remove them in order to accurately represent the underlying effects of the conditions being tested. Here we present a framework for pretraining on large-scale microscopy datasets that includes three steps: (1) curating a set of diverse and self-consistent training samples, (2) scaling training of an appropriate foundation model architecture on this dataset, (3) evaluating intermediate layers of the trained model to identify the best representation for downstream tasks. Using this strategy, we present the largest foundation model for cell microscopy data to our knowledge, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60\% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome relationship recall, batch correction replicate consistency, and compound-gene activity prediction benchmarks.

Lay Summary: Scientists use microscopes to generate vast numbers of images showing how cells react to drugs or genetic modifications. Extracting reliable biological insights from this massive and often noisy image data is a major challenge, slowing down our ability to understand diseases and discover new treatments. We developed a new three-step framework to train powerful AI models on this complex cell microscopy data. First, we carefully selected only the most informative images to create a high-quality, diverse training dataset. Second, we trained a very large AI model (with 1.9 billion parameters) on this refined dataset. Third, we discovered that using information from an intermediate processing stage of this AI, rather than its final output, provides a more accurate understanding of the cells' responses. Our method significantly improves how well AI can interpret these microscopy images. The new model is much better at identifying similar biological effects from different experiments, providing more consistent results, and predicting how potential drugs will interact with genes. This approach can accelerate biological discovery and help speed up the development of new medicines by making better sense of large-scale cellular imaging experiments.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://huggingface.co/recursionpharma/OpenPhenom

Primary Area: Applications->Health / Medicine

Keywords: MAE, drug discovery, microscopy, SSL, linear probing, biology, high-content screening, foundation models

Submission Number: 11222

Loading