Keywords: Multi-Modality, Contrastive Learning, CLIP, Cell Morphology, Molecules, Molecular Retrieval, Zero-Shot Learning, Cell-Painting
TL;DR: We address the challenge of contrastive phenomic molecular retrieval. We demonstrate pre-trained uni-modal representation methods can be used in a variety of ways to significantly improve zero-shot molecular retrieval rates.
Abstract: Predicting molecular impact on cellular function is a core challenge in therapeutic design.
Phenomic experiments, designed to capture cellular morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell.
In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning.
Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments.
We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration.
We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration.
Following this recipe, we propose MolPhenix, a molecular phenomics model.
MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds.
In particular, we demonstrate an 8.1$\times$ improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy.
These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.
Submission Number: 85
Loading