Keywords: protein structure, benchmarking, datasets
TL;DR: Datasets and evaluation tasks for protein 3D structure data.
Abstract: We present ProteinShake, a Python software package that simplifies dataset
creation and model evaluation for deep learning on protein structures. Users
can create custom datasets or load an extensive set of pre-processed datasets from
biological data repositories such as the Protein Data Bank (PDB) and AlphaFoldDB.
Each dataset is associated with prediction tasks and evaluation functions covering
a broad array of biological challenges. A benchmark on these tasks shows that pre-
training almost always improves performance, the optimal data modality (graphs,
voxel grids, or point clouds) is task-dependent, and models struggle to generalize
to new structures. ProteinShake makes protein structure data easily accessible
and comparison among models straightforward, providing challenging benchmark
settings with real-world implications.
ProteinShake is available at: https://proteinshake.ai
Supplementary Material: pdf
Submission Number: 816
Loading