Rethinking Transformer through Dual Banach Spaces

ICLR 2026 Conference Submission19270 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer, Dual Banach Space, Regularization Method
TL;DR: We introduce a novel interpretation of Transformers through dual Banach spaces, proving attention acts as a dual operator and feed-forward networks as corrective mechanisms.
Abstract: Transformers have significantly advanced deep learning across multiple domains, yet the theoretical foundations with respect to their structure remain an open area of research. In this paper, we introduce a novel perspective by interpreting Transformers through the framework of dual Banach spaces. Specifically, we prove that the exponentiated query-key kernel in the attention mechanism can be interpreted as a bilinear form on Banach spaces. Building on this, we provide a theoretical proof demonstrating that the attention mechanism in Transformers can be viewed as a dual space operator, while feed-forward networks function as a correction mechanism between dual solution and primal solution. To demonstrate the benefits of the dual Banach space perspective, we show how this framework introduces a novel form of regularization for Transformer. These findings offer new insights into understanding and potentially improving Transformer architectures using principled mathematical frameworks.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 19270
Loading