Keywords: Quantum Chemistry, Machine Learning, Organic Molecules
Abstract: Artificial intelligence is revolutionizing computational chemistry, bringing unprecedented innovation and efficiency to the field. To further advance research and expedite progress, we introduce the Quantum Open Organic Molecular (QO2Mol) database — a large-scale quantum chemistry dataset designed for researches on organic molecules under an open-source license.
The database comprises 120,000 organic molecules and more than 20 million conformers, encompassing 10 different elements (C, H, O, N, S, P, F, Cl, Br, I), with heavy atom counts exceeding 40. Each conformation was computed at B3LYP/def2-SVP level of theory to derive quantum mechanical properties, including potential energy and forces. The molecules included in the dataset are based on fragments from compounds in ChEMBL, ensuring their structural \textit{relevance to real-world compounds}.
The extensive variety of molecular structures and elemental compositions represented in the dataset can facilitate construction of potential energy surface and various downstream tasks.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6862
Loading