Efficient Gradient Estimation via Adaptive and Importance Sampling

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: SGD, Importance sampling, Adaptive sampling, Classification
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Classification tasks in machine learning heavily depend on stochastic gradient descent~(SGD) for optimization. The efficiency of SGD hinges on accurate gradient estimation from a mini-batch of data samples. Adaptive or importance sampling, as opposed to the popular uniform sampling, diminishes gradient-estimation noise by constructing mini-batches that emphasize crucial data points. Prior work has shown that data points should be chosen with probability ideally proportional to the magnitude of their gradient. However, computing these magnitudes for each sample incurs a heavy computational overhead. We propose a simplified importance function that depends \textit{only} on the output layer's loss gradient. We analytically derive this loss gradient for classification problems and establish an upper bound on the loss gradient. Leveraging the proposed gradient estimation, we report enhanced convergence in various classification tasks with minimal computational overhead. We demonstrate the effectiveness of our importance-sampling strategy on image and point-cloud datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3630