Transformers trained on proteins can learn to attend to Euclidean distance

Isaac Ellmen; Constantin Schneider; Matthew I. J. Raybould; Charlotte Deane

Transformers trained on proteins can learn to attend to Euclidean distance

Isaac Ellmen, Constantin Schneider, Matthew I. J. Raybould, Charlotte Deane

Published: 24 Jul 2025, Last Modified: 24 Jul 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: While conventional Transformers generally operate on sequence data, they can be used in conjunction with structure models, typically SE(3)-invariant or equivariant graph neural networks (GNNs), for 3D applications such as protein structure modelling. These hybrids typically involve either (1) preprocessing/tokenizing structural features as input for Transformers or (2) taking Transformer embeddings and processing them within a structural representation. However, there is evidence that Transformers can learn to process structural information on their own, such as the AlphaFold3 structural diffusion model. In this work we show that Transformers can function independently as structure models when passed linear embeddings of coordinates. We first provide a theoretical explanation for how Transformers can learn to filter attention as a 3D Gaussian with learned variance. We then validate this theory using both simulated 3D points and in the context of masked token prediction for proteins. Finally, we show that pre-training protein Transformer encoders with structure improves performance on multiple downstream tasks, yielding competitive performance with custom structural models. Together, this work provides a basis for using standard Transformers as hybrid structure-language models. The code is available at: https://github.com/oxpig/attending-to-distance.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/oxpig/attending-to-distance

Supplementary Material: zip

Assigned Action Editor: ~Serguei_Barannikov1

Submission Number: 4445

Loading