Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Published: 25 Oct 2023, Last Modified: 10 Dec 2023AI4D3 2023 PosterEveryoneRevisionsBibTeX
Keywords: Molecular generation
TL;DR: We introduce a novel hierarchical textual inversion framework for data-efficient molecular generation.
Abstract: Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by a recent textual inversion technique in the visual domain that achieves data-efficient generation via simple optimization of a new single text token of a pre-trained text-to-image generative model. However, we find that its naive adoption fails for molecules due to their complicated and structured nature. Hence, we propose a hierarchical textual inversion scheme based on introducing low-level tokens that are selected differently per molecule in addition to the original single text token in textual inversion to learn the common concept among molecules. We then generate molecules using a pre-trained text-to-molecule model by interpolating the low-level tokens. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50$\times$ less training data. We also show the efficacy of HI-Mol in various applications, including molecular optimization and low-shot molecular property prediction.
Submission Number: 18
Loading