Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

ICLR 2026 Conference Submission18976 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformer, Benign overfiting, Feature learning theory, Generalization error bounds, Signal-to-noise ratio

TL;DR: We present generalization bounds for a two-layer Transformer under benign overfitting and harmful overfitting.

Abstract: Transformers serve as the foundational architecture for many successful large-scale models, demonstrating the ability to overfit the training data while maintaining strong generalization on unseen data, a phenomenon known as benign overfitting. However, existing research has not sufficiently explored generalization and training dynamics of transformers under benign overfitting. This paper addresses this gap by analyzing a two-layer transformer's training dynamics, convergence, and generalization under labeled noise. Specifically, we present generalization error bounds for benign and harmful overfitting under varying signal-to-noise ratios (SNR), where the training dynamics are categorized into three distinct stages, each with its corresponding error bounds. Additionally, we conduct extensive experiments to identify key factors in transformers that influence test losses. Our experimental results align closely with the theoretical predictions, validating our findings.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 18976

Loading