Bridging Between Stable Rank and Data Selection: A Novel Sampling Method for Fast Training of Deep Neural Networks
Keywords: data-efficient training, stable rank, gradient trajectory, low-dimensional training, importance sampling
Abstract: Data selection for efficient training aims to reduce the computational cost by selecting a subset of data to approximate the objective function. A number of elegant approaches have been proposed in past years, such as the popular importance sampling and coreset methods. However, their required sample sizes usually have a linear dependence on the dimension of parameter space (or other types of dimensions like pseudo dimension), which could be very large and thus hinder their applications to deep neural networks. In this paper, we aim to provide a deeper understanding of the connection between data selection and the complexity of training space in theory. Inspired by the effectiveness of prevalent low-rank fine-tuning techniques, we propose to study the sample size from the perspective of Gradient Trajectory (GT). Specifically, we measure the dimension of training space by the "stable rank" of gradient trajectory matrix (GT matrix), and propose a novel data selection method called "Stable Rank related Stratified Sampling method (SRS-Sampling)'' to accelerate the training process. Moreover, we establish the theoretical framework between the evolving stable rank of GT matrix and the required sample size. Finally, we conduct a set of experiments across pre-training and fine-tuning to validate the effectiveness of SRS-Sampling.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 11533
Loading