Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

ICLR 2026 Conference Submission20680 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large step size, gradient descent, matrix factorization, convergence, implicit bias, chaos, fractal basin boundary

TL;DR: Gradient descent with near-critical step sizes enters a chaotic regime, characterized by sensitivity to initialization, fractal convergence regions, and absence of simple implicit biases.

Abstract: We examine gradient descent in matrix factorization and show that under large step sizes the parameter space develops a fractal structure. We derive the exact critical step size for convergence in scalar-vector factorization and show that near criticality the selected minimizer depends sensitively on the initialization. Moreover, we show that adding regularization amplifies this sensitivity, generating a fractal boundary between initializations that converge and those that diverge. The analysis extends to general matrix factorization with orthogonal initialization. Our findings reveal that near-critical step sizes induce a chaotic regime of gradient descent where the long-term dynamics are unpredictable and there are no simple implicit biases, such as towards balancedness, minimum norm, or flatness.

Primary Area: optimization

Submission Number: 20680

Loading