Dimension Debate: Is 3D a Step Too Far for Optimizing Molecules?

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian optimization, molecular representation, surrogate models, transfer learning
TL;DR: Investigation of the underutilization of 3D molecular features in Bayesian optimization for materials discovery, comparing their effectiveness against 1D and 2D features through large-scale evaluation across multiple datasets.
Abstract: The discovery of new molecular materials with desirable properties is essential for technological advancements, from pharmaceuticals to renewable energy. However, the discovery process is arduous, requiring many trial-and-error cycles of complex and expensive experiments. Bayesian optimization (BO) is commonly used to find and screen candidate molecules efficiently. However, it is unclear how to choose the right molecular representations for a Bayesian surrogate model: While molecules are 3-dimensional in nature, 3D features in BO have largely been underexplored. Indeed, 1D and 2D molecular features---which incur loss of information---are typically used. In this work, we study this discrepancy: Why have 3D features been overlooked for BO in materials discovery? To this end, we evaluate 3D features against standard lower-dimensional features. We assess their optimization performance on real-world chemistry datasets, considering both various settings such as low- & high-data regimes and transfer learning, and different types of Bayesian surrogates. This amounts to the evaluation of 35 different setups per dataset, totaling over 2100 distinct runs. Our large-scale work provides insights and modeling guides to chemists and practitioners on the trade-offs between 1D, 2D, and 3D representations, in a bid to further accelerate materials discovery.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11069
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview