Position-Aware Attention Mechanism: A Mathematical Framework for Enhanced Spatial Information Processing in Transformer Architectures

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attention Mechanism, Position-Aware Attention, Triple-Attention, Parameter Sensitivity Analysis
TL;DR: We propose a triple-attention architecture that combines position-aware, task-aware, and content-aware attention mechanisms with mathematical frameworks for optimal information distribution and adaptive attention allocation
Abstract: We propose a position-aware attention mechanism based on the EAPR(Explicit Position-Attention Relationship ) framework that addresses the limitations of traditional attention mechanisms in capturing positional relationships through a parametric positional effect function. The EPAR framework establishes explicit mathematical relationships between positional distance and attention intensity using three key parameters: $\alpha$ (position influence intensity), $\beta$ (spatial decay rate), and $\gamma$ (enhancement coefficient for long-range dependencies). We prove mathematical properties (continuity, differentiability, monotonicity) and demonstrate fine-grained control over attention allocation. To address over-attenuation at long distances, we introduce enhancement coefficient $\gamma$, ensuring a non-zero lower bound for attention weights. We develop an adaptive triple-attention architecture with task-aware and content-aware modules for dynamic weight adjustment. Our method includes a maximum benefit position formula and consistency metric for evaluation. Experimental results show superior performance in structured and clustered scenarios, particularly for information retrieval and document understanding tasks, demonstrating advantages over existing position encoding methods including RoPE and relative position encoding~
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 4703
Loading