Token-Wise Kernels (TWiKers) for Vicinity-Aware Attention in Transformers

Token-Wise Kernels (TWiKers) for Vicinity-Aware Attention in Transformers

ACL ARR 2025 July Submission237 Authors

25 Jul 2025 (modified: 02 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Self-attention mechanisms in transformers enable tokens to interact across a sequence but lack an explicit inductive bias to capture local contextual dependencies, an inherent characteristic of human languages. We propose Token-Wise Kernels (TWiKers), a novel enhancement to transformers that learn token-specific convolutional kernels applied to the keys or values. Each token is assigned a small kernel, initialized to the "Central Dirac" (e.g., [0,1,0] for size=3), meaning the token "bears" the attention from all other tokens alone. During training, these kernels adapt, and greater deviation from the Central Dirac indicates stronger attention redistribution to neighboring tokens. This introduces the first transformer weights with direct semantic interpretability. Our experiments show that content words (e.g., nouns and verbs) retain self-focus, while function words (e.g., prepositions and conjunctions) shift attention toward their neighbors, aligning with their syntactic and semantic roles. We further apply TWiKers to distinguish literary genres, historical periods, and authors, demonstrating their effectiveness in capturing high-level stylistic patterns. Finally, by allowing them to vary with attention heads, we show the potential of TWiKers as a new inductive bias to enhance transformer training.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Explanation faithfulness, Probing, Feature attribution, Data influence

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Previous URL: https://openreview.net/forum?id=bqzMvtQlRR

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: We respectfully request a reassignment, as the previous reviewers primarily focused on base model performance improvements, whereas the core innovation of our work lies in introducing an interpretable, token-specific inductive bias in transformers and demonstrating its application to lexical and stylistic analysis of English literature. While we have addressed the reviewers’ concerns regarding model performance through additional experiments, we hope that a reassigned AC and reviewers will place greater emphasis on evaluating the paper based on its central contributions.

Software: tgz

Data: tgz

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

A2 Elaboration: Our paper does not involve any foreseeable potential risks, as it focuses solely on linguistic analysis and theoretical evaluation without introducing sensitive datasets, personal data, or technologies prone to misuse.

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 4; Appendix A

B2 Discuss The License For Artifacts: No

B2 Elaboration: We will release all code, data, and trained models under the MIT License via our GitHub repository, which will be made public upon acceptance. During the review, we have uploaded an anonymized version of the code and data to the submission site.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 4; Appendix A

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: Appendix A

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Appendix A

B6 Statistics For Data: Yes

B6 Elaboration: Section 4; Appendix A, B

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Appendix B

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4; Appendix B

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4

C4 Parameters For Packages: Yes

C4 Elaboration: Appendix B

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Appendix E

Author Submission Checklist: yes

Submission Number: 237

Loading