Masked Autoencoders are Scalable Learners of Cellular Morphology

Published: 27 Oct 2023, Last Modified: 22 Nov 2023GenBio@NeurIPS2023 SpotlightEveryoneRevisionsBibTeX
Keywords: foundation model, masked autoencoder, vision transformer, computer vision, microscopy, high content screening, CRISPR
TL;DR: Masked autoencoders trained on 93 million microscopy images significantly outperform SOTA weakly supervised models, achieving relative improvements as high as 28% at inferring known biological relationships curated from public databases.
Abstract: Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how self-supervised deep learning approaches scale when training larger models on larger microscopy datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised baselines. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 93-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised baseline at inferring known biological relationships curated from public databases. Relevant code and select models released with this work can be found at: https://github.com/recursionpharma/maes_microscopy.
Submission Number: 20
Loading