ProteinVista: A compute-efficient atom-level 3D CNN that outperforms sequence transformers in protein–ligand prediction

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D CNN, protein representations, protein–ligand prediction, AlphaFold structures, structure-based learning, enzyme–substrate classification, drug–target interactions, contrastive pre-training
TL;DR: ProteinVista is a compute-efficient atom-level 3D CNN trained on AlphaFold structures that outperforms sequence-based transformers on structure dependent tasks.
Abstract: Protein function emerges from three-dimensional geometry, but many large-scale prediction pipelines still rely on linear sequence embeddings alone. Although structure-aware protein graph neural networks add residue connectivity, they omit atom-level details and therefore struggle to encode the detailed chemistry of binding sites. Here, we introduce ProteinVista, a compute-efficient 3D convolutional neural network that voxelizes every heavy atom, learns rotation-robust representations through 3D data augmentation, and is pre-trained on over 500\,000 AlphaFold-2 structures, which is more than two orders of magnitude less data than used for training state-of-the-art protein language models. Despite its compact size of 123 million parameters, ProteinVista outperforms sequence transformers on three benchmarks that require fine structural resolution: enzyme–substrate classification; transporter–substrate classification; and drug–target inhibition prediction. A simple ensemble with ESM-2 can further improve accuracy, indicating that sequence and structure signals are partly complementary. The results demonstrate that full-atom 3D CNNs are both tractable and superior than protein transformers for structure-dependent tasks. An open-source Python implementation makes ProteinVista easily accessible for application and fine-tuning.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 4178
Loading