Multi-Dataset Multi-Task Framework for Learning Molecules and Protein-target Interactions PropertiesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Graph Neural Network, Molecules, Protein-ligand binding, Multidataset, Multitask
TL;DR: Graph Neural Network; Multidataset Multitask; Molecular Property Prediction; Protein-ligand Binding Affinity
Abstract: Molecular property prediction and protein-target interaction prediction with deep learning are becoming increasingly popular in drug discovery pipelines in recent years. An important factor that limits the development of these two areas is the insufficiency of labeled data. One promising direction to address this problem is to learn shared embedding from multiple prediction tasks within one molecular type, \eg{} molecule or protein, because different tasks might actually share similar coarse-grained structural information. Unlike the previous methods, in this paper, we first argue that, due to the possible local structural similarity between molecules and protein-target complexes, coarse-grained latent embeddings can be found across different molecular types. To take advantage of this, we propose a new Multi-Dataset Multi-Task Graph Learning (MDMT-GL) framework, where we are able to make the most use of the labeled data by simultaneously training molecule property prediction and protein-target interaction prediction together. MDMT-GL augments molecular representations with equivariant properties, 2D local structures, and 3D geometric information. MDMT-GL can learn coarse-grained embeddings for molecules and proteins, and also distinguish fine-grained representations in various downstream prediction tasks with unique characteristics. Experimentally, we implement and evaluate MDMT-GL on 2 molecular dynamic datasets and 2 protein-target datasets, consisting of 825 tasks and over 3 million data points. MDMT-GL achieves state-of-the-art performance on several tasks and shows competitive performance on others. These experimental results confirm that molecules and proteins indeed share some coarse-grained structures and that the coarse-grained embedding is trainable, and their fine-grained embeddings are more representative. To the best of our knowledge, this is the first work to train multi-task learning across different molecular types, and to verify the structural similarity between the molecules and the protein-target complexes.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Machine Learning for Sciences (eg biology, physics, health sciences, social sciences, climate/sustainability )
Supplementary Material: zip
5 Replies

Loading