ATOM3D: Tasks on Molecules in Three DimensionsDownload PDF

Published: 29 Jul 2021, Last Modified: 22 Oct 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: machine learning, structural biology, biomolecules
TL;DR: ATOM3D is a collection of benchmark datasets for learning algorithms that work with 3D biomolecular structure.
Abstract: Computational methods that operate on three-dimensional (3D) molecular structure have the potential to solve important problems in biology and chemistry. Deep neural networks have gained significant attention, but their widespread adoption in the biomolecular domain has been limited by a lack of either systematic performance benchmarks or a unified toolkit for interacting with 3D molecular data. To address this, we present ATOM3D, a collection of both novel and existing benchmark datasets spanning several key classes of biomolecules. We implement several types of 3D molecular learning methods for each of these tasks and show that they consistently improve performance relative to methods based on one- and two-dimensional representations. The choice of architecture proves to be important for performance, with 3D convolutional networks excelling at tasks involving complex geometries, graph networks performing well on systems requiring detailed positional information, and the more recently developed equivariant networks showing significant promise. Our results indicate that many molecular problems stand to gain from 3D molecular learning, and that there is potential for substantial further improvement on many tasks. To lower the barrier to entry and facilitate further developments in the field, we also provide a comprehensive suite of tools for dataset processing, model training, and evaluation in our open-source atom3d Python package. All datasets are available for download from www.atom3d.ai.
Supplementary Material: zip
URL: www.atom3d.ai
Contribution Process Agreement: Yes
Dataset Url: https://www.atom3d.ai/
License: The ATOM3D code is licensed under the MIT license. The datasets are licensed under the following licenses: SMP: Creative Commons CC-BY license. PIP: Creative Commons CC-BY license. RES: Creative Commons CC-BY license. MSP: Creative Commons CC-BY license. LBA: Creative Commons NonCommercial-NoDerivs (CC-BY-NC-ND) license. LEP: Creative Commons CC-BY license. PSR: Creative Commons CC-BY license. RSR: Creative Commons Attribution-ShareAlike 3.0 Unported (CC-SA) license.
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/arxiv:2012.04035/code)
8 Replies

Loading