MolEmb: Multimodal Large Language Models Can Be Strong Molecular Embedding Models

Published: 28 May 2026, Last Modified: 11 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Foundation Models, Molecular Embedding Models, Multimodal Large Language Models, Small-Molecule Modeling, Drug Discovery
Abstract: Small-molecule modeling is important across drug discovery, chemical biology, and computational life sciences, where molecular embedding models can serve as foundational infrastructure for property prediction, virtual screening, toxicity assessment, and retrieval. Most molecular encoders are specialist models built around a single molecular view, producing unconditional vectors with no language interface for varying the representation. We ask whether multimodal large language models (MLLMs), which natively process images, text, and symbolic inputs, can instead serve as general molecular embedding models that produce embeddings conditioned on both a molecular profile and a natural-language semantic context. We introduce MolEmb, a lightweight framework that adapts MLLMs by aligning molecular profiles with textual descriptions in a shared embedding space using a bidirectional contrastive objective. The resulting embedding model is competitive on molecular property prediction and supports cross-modal molecule--text retrieval in the same space. We further introduce MolCAR, a diagnostic benchmark for context-aware retrieval, and find that context-aware molecular embedding is primarily a data property of the supervision. These results suggest that MLLMs are not merely chemistry assistants or generators, but a viable and extensible route to general molecular embedding models for life-science workflows.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading