It Depends: Understanding Why Models Struggle with Long-Range Dependencies

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: natural language processing, dependencies, systematic evaluation
Abstract: As researchers seek to better understand the architectures that have driven recent performance improvements in Natural Language Processing (NLP), analyses are commonly carried out on the handling of long-range dependencies. However, the use of this phrase can be inconsistent and unclear across the literature, making it difficult to understand which element of a dependency is the key factor affecting performance. In this work we create a systematic framework for discussing dependencies and carrying out careful analyses of how the many variables involved can impact architecture performance. By disentangling these factors we clarify the discussion of dependencies and enable more meaningful comparisons across architectures. Using this framework, our experiments find that, despite often being the main focus of work on dependencies, the distance between a token and the tokens on which it depends is not a substantial factor in model performance. However, the number of tokens involved in the dependency and the complexity and nature of the dependency are important factors. We also find that architectural elements do not uniformly improve or degrade performance across tasks, but that their effect is dependent on the nature of the dependency being modelled. This framework can be built on and used to motivate principled discussions of architecture performance in the future.
Primary Area: applications to computer vision, audio, language, and other modalities
Supplementary Material: zip
Submission Number: 6150
Loading