LieRE: Generalizing Rotary Position Encodings to Higher Dimensional Inputs

Sophie Ostmeier; Brian Axelrod; Michael Moseley; Akshay S Chaudhari; Curtis Langlotz

LieRE: Generalizing Rotary Position Encodings to Higher Dimensional Inputs

Sophie Ostmeier, Brian Axelrod, Michael Moseley, Akshay S Chaudhari, Curtis Langlotz

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Position Encoding, Attention, Transformer, Computer Vision, Machine Learning

TL;DR: LieRE encodes positions of tokens for n-dimensional data by replacing the RoPE rotation matrix with a dense, high-dimensional rotation matrix generated via a learned map.

Abstract: Rotary Position Embeddings (RoPE) have demonstrated efficacy and gained widespread adoption in natural language processing. However, their application to other modalities has been less prevalent. This study introduces Lie group Relative position Encodings (LieRE), which extend beyond RoPE by accommodating n-dimensional inputs. LieRE encodes positions of tokens by replacing the RoPE rotation matrix with a dense, high-dimensional, rotation matrix generated via a learned map. We conducted empirical evaluations of LieRE on 2D and 3D image classification tasks, comparing its performance against established baselines including DeiT III, RoPE-Mixed, and Vision-Llama. Our findings reveal significant advancements across multiple metrics as compared to the DEIT III basline: LieRE leads to marked relative improvements in accuracy (10.0% for 2D and 15.1% for 3D compared to DeiT). A 3.9-fold reduction in training time for the same accuracy was observed. LieRE required 30% less training data to achieve comparable results. These substantial improvements suggest that LieRE represents a meaningful advancement in positional encoding techniques for multi-dimensional data. The implementation details and reproducibility materials will be made openly available.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5013

Loading