BenchMol: A Multi-Modality Benchmarking Platform for Molecular Representation Learning

ICLR 2025 Conference Submission714 Authors

14 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Modality Learning, Benchmarks and Datasets, Drug Discovery, Molecular Representation Learning
Abstract: Molecular representation learning (MRL) plays a vital role in high-precision drug discovery. Currently, people represent molecules in different modalities (such as sequences, graphs, and images), and have developed many MRL methods. However, three key challenges hinder further progress in the field of MRL: (i) Lack of systematic and unified evaluation on models of different modalities, resulting in unfair comparisons or being affected by randomness; (ii) The specific advantages between different molecular modalities are unclear; (iii) Lacking a unified platform to integrate data of different modalities and a large number of MRL methods. Therefore, we propose the first MRL platform supporting different modalities, called BenchMol, to integrate a large number of sing-modal MRL methods with different modalities and evaluate them systematically and fairly. BenchMol has four attractive features: (i) Rich modalities: BenchMol supports 7 major modalities of molecules, such as fingerprint, sequence, graph, geometry, image, geometry image, and video; (ii) Comprehensive methods: BenchMol integrates 23 mainstream MRL methods to process these modalities; (iii) New benchmarks: BenchMol constructs two new benchmarks based on PCQM4Mv2 and ChEMBL 34, called MBANet and StructNet, for a more systematic evaluation. (iv) Comprehensive evaluation: evaluation covers different aspects of molecules, such as basic attributes and molecular types. Through BenchMol, we conduct large-scale research on methods of different modalities and report many insightful findings. We hope that BenchMol can help researchers quickly use MRL methods with different modalities on the one hand; and on the other hand, provide meaningful insights into multi-modal MRL and help researchers choose appropriate representations in downstream tasks. We open-sourced BenchMol in \href{https://anonymous.4open.science/r/BenchMol}{Github}.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 714
Loading