Automated Attention Pattern Discovery at Scale in Large Language Models

TMLR Paper5837 Authors

07 Sept 2025 (modified: 02 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models have scaled rapidly, but interpretability methods have lagged behind, especially in real-world noisy data that is less controlled than curated benchmarks. Existing approaches focus on fine-grained explanations of individual components, which are resource intensive and struggle to generalize across tasks, domains, and models. To enable broader insights, we analyze and track attention patterns across predictions. We show that vision models offer a promising direction for analyzing attention patterns at scale. To demonstrate this, we introduce the Attention Pattern~-- Masked Autoencoder (AP-MAE), a vision transformer-based model that efficiently reconstructs masked attention patterns. Experiments on StarCoder2 models (3B–15B) show that AP-MAE (i) reconstructs masked attention patterns with high accuracy, (ii) generalizes across unseen models with minimal degradation, (iii) reveals recurring patterns across a large number of inferences, (iv) predicts whether a generation will be correct without access to ground truth, with up to 70\% accuracy, and (v) enables targeted interventions that increase accuracy by 13.6\% when applied selectively, but cause rapid collapse when applied excessively. These results establish attention patterns as a scalable signal for interpretability and demonstrate that AP-MAE provides a transferable foundation for both analysis and intervention in large language models. Beyond its standalone value, AP-MAE can also serve as a selection procedure to guide more fine-grained mechanistic approaches toward the most relevant components. We release code and models to support future work in large-scale interpretability.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Vlad_Niculae2
Submission Number: 5837
Loading