Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

Hehe Fan; Yi Yang; Mohan Kankanhalli; Fei Wu

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

Hehe Fan, Yi Yang, Mohan Kankanhalli, Fei Wu

01 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep neural network

TL;DR: The paper proposes a fundamental operation with the potential to serve as a building block for the next generation of deep neural networks.

Abstract: When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and (2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for structural representation learning. In contrast, convolution encodes elements in a relative manner, yet their fixed kernel size limits their ability to adaptively select the relevant elements. In this paper, we introduce Translution, an operation that unifies the adaptive identification capability of self-attention and the relative encoding advantage of convolution. However, this integration leads to a substantial increase in the number of parameters, exceeding most currently available computational resources. Therefore, we propose a lightweight variant of Translution, named LoR-Translution. Experiments on computer vision and natural language processing tasks show that Translution (including LoR-Translution) achieves superior accuracy compared to self-attention, demonstrating its potential to build the next generation of deep neural networks. The code has been included in the supplementary materials and will be released soon.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 414

Loading