MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction

Published: 25 Oct 2023, Last Modified: 10 Dec 2023AI4D3 2023 PosterEveryoneRevisionsBibTeX
Keywords: Drug Discovery, Large Language Models (LLM), Multi-Modal Training
Abstract: Harnessing textual information offers significant advantages in the drug design process, providing invaluable insights into complex molecular structures and facilitating molecule design based on textual instructions. With recent advancements in the utilization of Large Language Models (LLMs) for multi-modal data applications, we aim to leverage the capabilities of LLM for molecule property prediction tasks. We introduce MoleculeGPT, which is designed to provide answers to queries concerning molecular properties on the basis of molecular structure inputs. To train the MoleculeGPT, we have curated a new dataset from the raw molecule description in PubChem for instruction-following tasks. We evaluate the performance of MoleculeGPT on multiple-choice questions and several downstream tasks on molecule property prediction for drug design. Experimental results show that MoleculeGPT can generate responses that closely resemble human-level performance and demonstrate exceptional capabilities across diverse downstream tasks.
Submission Number: 34
Loading