MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning

Published: 17 Jun 2024, Last Modified: 27 Jun 2024AccMLBio SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Molecular Learning, Parameter-Efficient Foundation Model, Biological Discovery
TL;DR: MiniMol is an open-source foundation model for molecular machine learning which outperforms the best previous foundation model on 17/22 downstream tasks from the TDC ADMET while having ten times fewer parameters.
Abstract: We propose MiniMol, an open-source foundation model for molecular machine learning which outperforms the best previous foundation model on 17/22 downstream tasks from the Therapeutic Data Commons (TDC) ADMET group while having ten times fewer parameters. This efficiency is achieved through the use of a graph neural network (GNN), pre-trained on about 3,300 sparsely defined graph- and node-level tasks, using a dataset of 6 million molecules and 500 million quantum and biological labels. The model learns via multi-task, multi-label supervised training to produce embeddings that generalize well to a wide range of biological tasks, and that can be efficiently used by simple Multi-Layer Perceptron (MLP) models for the downstream task, as demonstrated by our experiments.
Submission Number: 16
Loading