Keywords: Optimizer, Physics-Inspired, Kinetics
Abstract: The design of optimization algorithms for neural networks remains a critical challenge, with most existing methods relying on merely heuristic adaptations of traditional gradient-based approaches. This paper introduces KO (Kinetics-inspired Optimizer), a novel neural optimizer gadget inspired by kinetic theory and partial differential equation (PDE) simulations. KO can be used with multiple types of base optimizers (e.g., Adam, SGD). In KO, the training dynamics of network parameters are perceived as the evolution of a particle system, where parameter updates are simulated via a numerical scheme for the Boltzmann transport equation (BTE) that models stochastic particle collisions. This physics-driven approach inherently promotes parameter diversity during optimization, mitigating the phenomenon of parameter condensation, i.e. collapse of network parameters into low-dimensional subspaces. Parameter condensation is proven harmful to model generalizability. We analyze KO's impact on parameter diversity, establishing both a strict mathematical proof and a physical interpretation. The convergence of the proposed optimizer can also be guaranteed. Extensive experiments on image classification (CIFAR-10/100, ImageNet) and text classification (IMDB, Snips) tasks demonstrate that KO consistently outperforms baseline optimizers, achieving accuracy improvements while remaining comparable computation cost.
Primary Area: optimization
Submission Number: 5662
Loading