MolJET: Multimodal Joint Embedding Transformer for Conditional de novo Molecular Design and Multi-Property OptimizationDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Transformers, Multimodal, Molecules, Generative, Drug-design, LLM
TL;DR: MolJET is a foundational generative chemistry model for molecular design that uses joint embeddings learned from three chemistry-related modalities to perform conditional multi-property optimization.
Abstract: Multi-property constrained optimization of molecules using generative de novo design models is vital for the successful application of Artificial Intelligence (AI) towards materials and drug discovery. Yet there remains a gap between the reported performance of such models in the literature and their practical utility in real world design scenarios. Furthermore, existing models are largely inaccessible to chemists without an extensive background in computer science. To address these challenges, we propose a generative foundation model, the Multimodal Joint Embedding Transformer (MolJET), which performs conditional generation of desired molecular distributions based on human-interpretable chemistry prompts in a zero-shot manner. We assess MolJET on the standard benchmarks available in the GuacaMol and MIMOSA evaluation frameworks. These include structure-based sampling tasks as well as a range of multi-property optimization tasks that probe a models ability to design drug-like molecules given realistic property constraints. We demonstrate that with self-supervised pretraining, MolJET outperforms 80% of task-optimized models while using zero-shot inferences and beats all baselines after minimal supervision. Moreover, the performance of MolJET on text-only conditioning tasks improves with the inclusion of property modalities during training, highlighting the importance of a multimodal approach to molecular design. MolJET is the first example of text-based de novo molecular design using large-scale multimodal foundation models and should serve as a building block towards further improvements to accessible AI for chemists.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Generative models
7 Replies

Loading