everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Understanding the training dynamics of neural networks has gained much interest in the scientific community. The dynamics of training over-parameterized models is characterized by the lazy regime in which networks exhibit near-linear behavior and minimal parameter changes. In addition, it has been argued that the Jacobian of large neural models has a low-rank structure. In this paper, we focus on the opportunities laid out by the combination of low-rankness and laziness of large neural models. Specifically, we provide a scalable way to measure the extent of laziness, evaluated via the rate of change of the model Jacobian, as well as a scalable method to verify low-rankness of the model Jacobian without storing the entire Jacobian. Taking advantages of both laziness and low-rankness, we design a scalable training algorithm for over-parameterized models that performs backpropagation-free gradient descend training. In particular, this algorithm is of lower computation and storage requirements in cases of massive parameter sharing, as is the case of many state-of-the-art neural architectures. Empirical results confirm the scalability and effectiveness of our approach, opening new pathways for exploring novel learning strategies in neural networks.