AtoMAE: Learning Protein Structure Representations from Atomic Voxel Grids via Masked Autoencoders

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: protein structure, protein structure modeling, protein representation learning, self-supervised learning, masked autoencoders, vision transformers, voxel, inductive bias
TL;DR: AtoMAE learns protein structure representations solely from atomic voxel grids via masked autoencoders, outperforming protein language modeling and graph neural networks while requiring minimal biological prior knowledge
Abstract: We propose AtoMAE (Atomistic Transformer with Masked Autoencoder) for deciphering three-dimensional protein structures using limited biological prior knowledge. Rather than relying on amino acid identifiers or backbone markers, the model uses voxelized protein structures with atom types as its sole input. These atomic voxels allow for the use of a Vision Transformer architecture pre-trained via Masked Autoencoder framework. Through its self-supervised reconstruction approach, AtoMAE preserves spatial context while achieving superior performance and scalability without strong inductive biases or complicated modules. In structural classification, AtoMAE outperforms both protein language modeling and graph neural networks by effectively capturing both short- and long-range relationships. Furthermore, AtoMAE can predict residue identities from backbone structures alone, achieving accuracy comparable to inverse folding models while preserving architectural simplicity. These results encourage a design shift towards models that autonomously learn multi-level biological understanding, from structure to residue, instead of relying on architectures with deeply encoded domain knowledge.
Submission Number: 56
Loading