InertialTransformer: Early Explorations and Insights into Transformer-based Geometric Representation

Published: 31 Jul 2025, Last Modified: 14 Aug 2025LM4SciEveryoneRevisionsBibTeXCC BY 4.0
Keywords: language models, structure tokenization, SE(3)-equivariance, molecular property prediction
Abstract: In many biochemical studies, molecular geometries serve as fundamental data structures. Existing deep learning methods often focus on designing SE(3)-equivariant representation functions. However, such functions are physically constrained, which may limit the expressiveness of the models. In this work, we introduce InertialTransformer, a preliminary attempt to address this challenge. InertialTransformer comprises three key components: (1) it uses the inertial frame as a canonicalization method to align molecular geometries in 3D Euclidean space; (2) it incorporates a Euclidean-based positional encoding scheme; and (3) it employs a self-attention module to enable information exchange among atoms. By integrating these components, InertialTransformer achieves an SE(3)-equivariant yet unconstrained framework for geometric representation. We evaluate InertialTransformer on molecular geometry prediction tasks. While its performance does not yet match that of state-of-the-art 3D graph neural networks, it significantly outperforms existing SE(3)-equivariant Transformer-based approaches. We posit that InertialTransformer stands to benefit substantially from large-scale pretraining, which we leave as a direction for future work.
Submission Number: 18
Loading