MUBen: Benchmarking the Uncertainty of Molecular Representation Models

Yinghao Li; Lingkai Kong; Yuanqi Du; Yue Yu; Yuchen Zhuang; Wenhao Mu; Chao Zhang

MUBen: Benchmarking the Uncertainty of Molecular Representation Models

Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang

Published: 17 Apr 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large molecular representation models pre-trained on massive unlabeled data have shown great success in predicting molecular properties. However, these models may tend to overfit the fine-tuning data, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have included UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different UQ methods for state-of-the-art backbone molecular representation models to investigate their capabilities. By fine-tuning various backbones using different molecular descriptors as inputs with UQ methods from different categories, we assess the influence of architectural decisions and training strategies on property prediction and uncertainty estimation. Our study offers insights for selecting UQ for backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: The following outlines the revisions made to the previous version during the author-response period: **Major Updates:** - Included Figure 6, Figure 10, and associated discussions in Section 5, under "Impact of Training-Test Distribution Shift" and in Appendix A, under "Binning Test Data by Similarity to Training Scaffolds," following the suggestion of Action Editor Pj5z. - Added Table 6 in response to [comments from Reviewer rEgw](https://openreview.net/forum?id=qYceFeHgm4&noteId=NfnotzFAMv). - De-anonymized the manuscript and updated the GitHub repository link accordingly. - Included an "Acknowledgement" section. **Minor Updates:** - Adjusted some mathematical notations to enhance readability and align with the recommendations of the TMLR template. - Standardized the use of capital letters throughout the manuscript. - Improved the aesthetics of some figures and tables, without altering the content. - Merged footnotes that link to the same repository. - Refined the narrative to provide more precise descriptions. - Updated certain references from their arXiv versions to the published conference or journal versions.

Code: https://github.com/Yinghao-Li/MUBen

Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1

Submission Number: 2079

Loading