Track: Full Paper Track
Keywords: Multi-Modal, Representation learning, drug design, Contrastive learning
TL;DR: We propose a multimodal molecular representation learning framework that integrates chemical structures with biological modalities using contrastive learning, enabling more biologically informed molecular representations
Abstract: Molecular representation learning is a fundamental challenge in AI-driven drug discovery, with traditional unimodal approaches relying solely on chemical structures often failing to capture the biological context necessary for accurate toxicity and activity predictions. To address this, we propose a multimodal representation learning framework that integrates molecular data with biological modalities, including morphological features from Cell Painting assays and transcriptomic profiles from the LINCS L1000 dataset. Unlike traditional approaches that require complete triplets (molecule, morphological, genomic), our model only requires paired data—(molecule-morphological) and (molecule-genomic)—making it more practical and scalable. Our approach leverages contrastive learning to align molecular representations with biological data, even in the absence of fully paired datasets. We evaluate our framework on the ChEMBL20 dataset using linear probing across 1,320 tasks, demonstrating improvements in predictive performance. By incorporating diverse biological modalities, our approach enables more robust and biologically informed molecular representations, enhancing the predictive power of AI models in drug discovery.
Attendance: Muhammad Arslan Masood
Submission Number: 92
Loading