Keywords: protein representation learning, self-supervised learning
TL;DR: We propose SLAE, an all-atom autoencoder with a multi-task pretraining objective that yields transferable and chemically faithful protein representations.
Abstract: Building physically grounded protein representations is central to computational biology, yet most existing approaches rely on sequence-pretrained language models or backbone-only graphs that overlook side-chain geometry and chemical detail. We present SLAE, a unified all-atom framework for learning protein representations from each residue’s local atomic neighborhood using only atom types and interatomic geometries. To encourage expressive feature extraction, we introduce a novel multi-task autoencoder objective that combines coordinate reconstruction, sequence recovery, and energy regression. SLAE reconstructs all-atom structures with high fidelity from latent residue environments and achieves state-of-the-art performance across diverse downstream tasks via transfer learning. SLAE's latent space is chemically informative and environmentally sensitive, enabling quantitative assessment of structural qualities and smooth interpolation between conformations at all-atom resolution.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 21810
Loading