TL;DR: We show how FTRL with specific choices of linearization can achieve dynamic regret guarantees.
Abstract: We revisit the Follow the Regularized Leader (FTRL) framework for Online Convex Optimization (OCO) over compact sets, focusing on achieving dynamic regret guarantees. Prior work has highlighted the framework’s limitations in dynamic environments due to its tendency to produce "lazy" iterates. However, building on insights showing FTRL's ability to produce "agile" iterates, we show that it can indeed recover known dynamic regret bounds through optimistic composition of future costs and careful linearization of past costs, which can lead to pruning some of them. This new analysis of FTRL against dynamic comparators yields a principled way to interpolate between greedy and agile updates and offers several benefits, including refined control over regret terms, optimism without cyclic dependence, and the application of minimal recursive regularization akin to AdaFTRL. More broadly, we show that it is not the "lazy" projection style of FTRL that hinders (optimistic) dynamic regret, but the decoupling of the algorithm’s state (linearized history) from its iterates, allowing the state to grow arbitrarily. Instead, pruning synchronizes these two when necessary.
Lay Summary: Machine learning systems often make sequential decisions, such as choosing investments or managing resources, but struggle when environments constantly change. Existing methods either adapt slowly or rely heavily on perfect future predictions, limiting their practical use.
We revisited a classical decision-making method called "Follow the Regularized Leader" (FTRL), known for its simplicity but previously thought unsuitable for dynamic environments. By carefully selecting and sometimes discarding (pruning) past information, our improved approach makes smarter decisions that adapt quickly when conditions change.
The analysis also shows that this enhanced FTRL algorithm achieves robust performance, even when predictions about future conditions are uncertain. It smoothly adjusts between cautious and aggressive strategies based on the reliability of the predictions so far. This provides flexibility, ensuring good performance both when future predictions are accurate and when they fail.
These results help extend the practical applicability of FTRL, giving developers a flexible tool to build smarter systems that can adapt effectively in real-world, changing environments.
Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits
Keywords: Online Convex Optimization, FTRL, Dynamic Regret, Optimism
Submission Number: 11679
Loading