Keywords: finite-time optimization, Element-wise dynamics, Deep learning optimization, Sign-based methods
Abstract: Optimization algorithms are fundamental to deep neural network training, where exponential growth from millions to hundreds of billions of parameters has made training acceleration a critical necessity. While adaptive methods like Adam achieve remarkable success through element-wise learning rates, understanding their continuous-time counterparts can provide valuable theoretical insights into convergence guarantees beyond asymptotic rates.
Recent advances in continuous-time optimization have introduced fixed-time stable methods that promise finite-time convergence independent of initial conditions. However, existing approaches like FxTS-GF suffer from dimensional coupling, where coordinate updates depend on global gradient norms, creating suboptimal scaling in high-dimensional problems typical of deep learning.
To address this issue, we introduce an element-wise finite-time optimization framework that eliminates dimensional coupling through coordinate-independent dual-power dynamics. Furthermore, we extend the framework to momentum-enhanced variants for deep model training while preserving convergence properties through continuous-time analysis. Under mild assumptions, we establish rigorous finite-time and fixed-time convergence guarantees. Notably, our framework reveals that widely-used sign-based optimizers like SignSGD and Signum emerge as limiting cases, providing theoretical grounding for their empirical effectiveness. Experiments on CIFAR-10/100 and C4 language modeling demonstrate consistent improvements over existing methods.
Primary Area: optimization
Submission Number: 14751
Loading