Attend or Perish: Benchmarking Attention in Algorithmic Reasoning

ACL ARR 2025 May Submission5895 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Can transformers learn to perform algorithmic tasks reliably across previously unseen input/output domains? While pre-trained language models show solid accuracy on benchmarks incorporating algorithmic reasoning, assessing the reliability of these results necessitates an ability to distinguish genuine algorithmic understanding from rote memorization. In this paper, we propose an algorithmic benchmark comprising five tasks of infinite input domains where we can also disentangle and trace the correct, robust algorithm necessary for the task. This allows us to assess (i) models' ability to extrapolate to unseen types of inputs, including new lengths, value ranges or input domains, but also (ii) to assess the robustness of their learned mechanisms. By analyzing attention maps and performing targeted interventions, we causally demonstrate that the attention mechanism is a key bottleneck, directly contributing to failures in extrapolation. We make the implementation of all our tasks and interpretability methods publicly available. (See the supplementary material)
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: extrapolation, reasoning, algorithmic reasoning, evaluation, interpretability
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 5895
Loading