Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy

Published: 18 Jun 2024, Last Modified: 18 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: optimization, directional redundancy, efficiency, LLMs
Abstract: We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce a natural notion of the complexity of optimization trajectories which help hallmark the directional nature of optimization in neural networks: when is there redundancy, and when exploration. We utilize the trajectory perspective to showcase the effect of scale on regularizing the directional nature of trajectories. As a by-product, we also observe an intriguing heterogeneity of Q,K,V dynamics in the middle attention layers in LLMs which, however, is homogenized by scale. Importantly, we put the significant directional redundancy observed to the test by demonstrating that training only scalar batchnorm parameters some while into training matches the performance of training the entire network, and thus exhibiting the potential for hybrid optimization schemes geared towards efficiency.
Submission Number: 55
Loading