MolGen-Transformer: An open-source self-supervised model for Molecular Generation and Latent Space Exploration

Published: 08 Oct 2024, Last Modified: 03 Nov 2024AI4Mat-NeurIPS-2024EveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Full Paper
Submission Category: AI-Guided Design
Keywords: Molecular generation, Self-supervised learning, Organic molecule synthesis, Latent space exploration, Molecular diversity
Supplementary Material: pdf
TL;DR: MolGen-Transformer is a generative AI model trained on 198 million organic molecules, achieving 100% reconstruction accuracy and generating chemically similar, diverse, and intermediate molecules for AI-guided design and synthesis.
Abstract: We present the MolGen-Transformer, a generative AI model achieving 100% reconstruction accuracy through self-supervised training using a large, curated meta-dataset of organic molecules with less than 168 atoms. MolGen-Transformer produces valid molecular structures using the SELF-referencing Embedded Strings (SELFIES) representation. Our training dataset comprises 198 million organic molecules, selected to encompass a wide range of organic structures. We illustrate the generative capability of this model in three ways: (a) Generating chemically similar molecules, where the model creates structurally similar valid molecules to a given prompt molecule; (b) Producing Diverse Molecules, where the model creates structurally diverse valid molecules given a random latent seed, and (c) Identifying Chemical Intermediates, where the model creates a sequence of valid molecules connecting two given molecules. MolGen-Transformer allows the generation and exploration of structurally similar molecules and provides insights into structural pathways between molecules. The model weights and inference methods are publicly available to support community use. We also provide an easy-to-use website for exploration.
Submission Number: 80
Loading