Dissecting the Role of Positional Encoding in Length Generalization

Jiyang Shen; Gang He; Xiaojing Zhang; Bochen Lyu; Zhanxing Zhu

Dissecting the Role of Positional Encoding in Length Generalization

Jiyang Shen, Gang He, Xiaojing Zhang, Bochen Lyu, Zhanxing Zhu

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic Interpretation, Positional Encoding, Length Generalization, Iteration Head, Reasoning Tasks.

TL;DR: Exploring the mechanism of Positional Encoding in Length Generalization on Reasoning Tasks

Abstract: Length generalization (LG) is a persistent challenge for Transformers. Despite recent studies improving the models' LG capability, its underlying mechanisms are still underexplored. To better understand LG, we propose that LG requires alignment of the model’s inductive bias with the task’s computational structure, and validate this view with experiments on Transformers. Focusing on iterative tasks (e.g., Polynomial Iteration, Parity, Binary Copy), we systematically analyze different PEs and find that the misalignment persists for Transformers: the structural bias of softmax attention and computational biases from PEs destabilize LG under extrapolation. Notably, Transformers without positional encoding (NoPE) could show partial LG capability, potentially because implicit position encoding through hidden-state statistics and contextual token distributions preserves the consistent computation in extrapolation, though these signals decay with length, leaving the encoding misaligned with the task. Building on this mechanistic analysis, we introduce a lightweight enhancement—value-side relative coding with logit rescaling—that better aligns inductive bias with task structure. This sustains iterative computation and improves LG, offering insights for future PE design.

Primary Area: interpretability and explainable AI

Submission Number: 17734

Loading