Molphenix: A Multimodal Foundation Model for PhenoMolecular Retrieval

Published: 02 Nov 2024, Last Modified: 09 Nov 2024Neurips 2024 Workshop FM4Science OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Modality, Contrastive Learning, CLIP, Cell Morphology, Molecules, Molecular Retrieval, Zero-Shot Learning, Cell-Painting
TL;DR: We address the challenge of contrastive phenomic molecular retrieval. We demonstrate pre-trained uni-modal representation methods can be used in a variety of ways to significantly improve zero-shot molecular retrieval rates.
Abstract: Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellular morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1$\times$ improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy. These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.
Submission Number: 81
Loading