Track: long paper (up to 8 pages)
Keywords: Autoregressive Models; 3D Molecule Generation; SE(3) Symmetry;
Abstract: Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. The gap stems from two fundamental challenges: (1) how to tokenize molecules into a canonical 1D sequence of tokens that is invariant to both SE(3) transformations and atom index permutations, and (2) how to design an architecture capable of modeling hybrid atom-based tokens that couple discrete atom types with continuous 3D coordinates. To address these challenges, we introduce InertialAR. It first performs generation-oriented canonical tokenization by aligning each molecule to a canonical inertial frame and reordering atoms, thereby converting arbitrary 3D structures into a unique, SE(3)- and permutation-invariant sequence of tokens for autoregressive generation. Built upon this canonical tokenization, we propose geometric rotary positional encoding (GeoRoPE), which endows Transformer attention with 3D geometric awareness. Finally, InertialAR utilizes a hierarchical autoregressive paradigm to decode the next atom, consecutively predicting the atom type and 3D coordinates via Diffusion Loss. Experimentally, InertialAR achieves state-of-the-art performance on 8 of the 10 evaluation metrics for unconditional generation across QM9, GEOM-Drugs, and B3LYP. Moreover, it significantly outperforms baselines in controllable generation for targeted chemical functionality, attaining state-of-the-art results on all 5 metrics.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 43
Loading