PLuG-Attention: Unleashing the Potential of Attention via Plug-in Pairwise Logit Gating

ICLR 2026 Conference Submission18730 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attention Mechanisms, Vision Transformers, Deformable DETR, Pairwise Logit Gating
TL;DR: We introduce Pairwise Logit Gating (PLuG) attention, a simple yet effective plug-and-play mechanism that introduces a learnable gating mechanism that operates on each token pair to modulate attention logits prior to softmax.
Abstract: Despite its widespread success on vision tasks, standard attention employs a shared dot-product mechanism that uniformly scores all query–key interactions before applying softmax. In this paper, we hypothesize that explicitly controlling the amplification or suppression of individual query-key token-pair interactions can lead to more expressive and discriminative representations. To this end, we propose \textbf{Pairwise Logit Gating (PLuG)} attention, a simple yet effective plug-in approach that introduces a learnable gating mechanism operating on each token-pair to modulate attention logits prior to softmax. This gating enables the model to selectively amplify informative interactions and suppress spurious ones through gating coefficient matrix, improving its ability to capture spatial and semantic relationships critical for vision tasks. Experimental results demonstrate that PLuG can be seamlessly integrated into various attention mechanisms and attention-based architectures, including ViTs and Mask2Former, as well as multi-scale deformable attention in Deformable DETR, without requiring architectural redesign or hyperparameter tuning. These results highlight the effectiveness of PLuG as a general-purpose plug-in enhancement broadly applicable to attention-based vision tasks.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 18730
Loading