How to Retrain Online Models Optimally with Few Updates

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online Learning, Retraining Schedules, Learning Curve, Adaptive Algorithms, Distribution Shifts
TL;DR: We develop a theoretical framework for characterizing the optimal retraining frequency in online learning under various conditions.
Abstract: Retraining is the primary mechanism by which AI models can update their internal parameters in response to evolving environments, yet it is also one of the most costly operations. This raises a fundamental question: *how often must a model be retrained to achieve optimal performance over time?* We address this problem in the framework of *online learning*, beginning with the classical but foundational case of i.i.d. realizable data. We show that retraining at every step is unnecessary: in most cases, only $O(\log T)$ updates suffice to achieve near-optimal risk, where $T$ is the number of steps. Furthermore, when the *learning curve* decays as $1/t^{\alpha}$ with $\alpha < 1$, as few as $O(\log \log T)$ updates are enough. We design algorithms that achieve these guarantees, including adaptive methods that remain optimal when $\alpha$ is unknown, and extend our analysis to piecewise-stationary settings with *distribution shifts*. We also establish sharp impossibility results, proving that no universal algorithm exists without prior knowledge of the learning curve. Together, these results provide the first precise characterization of optimal retraining frequency, bridging foundational theory with practical strategies for scalable AI.
Primary Area: learning theory
Submission Number: 10677
Loading