Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

ICLR 2026 Conference Submission414 Authors

01 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep neural network
TL;DR: The paper proposes a fundamental operation with the potential to serve as a building block for the next generation of deep neural networks.
Abstract: When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and (2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for structural representation learning. In contrast, convolution encodes elements in a relative manner, yet their fixed kernel size limits their ability to adaptively select the relevant elements. In this paper, we introduce Translution, an operation that unifies the adaptive identification capability of self-attention and the relative encoding advantage of convolution. However, this integration leads to a substantial increase in the number of parameters, exceeding most currently available computational resources. Therefore, we propose a lightweight variant of Translution, named LoR-Translution. Experiments on computer vision and natural language processing tasks show that Translution (including LoR-Translution) achieves superior accuracy compared to self-attention, demonstrating its potential to build the next generation of deep neural networks. The code has been included in the supplementary materials and will be released soon.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 414
Loading