Towards Stochastic Gradient Variance Reduction by Solving a Filtering Problem

Xingyi Yang

Towards Stochastic Gradient Variance Reduction by Solving a Filtering Problem

Xingyi Yang

01 Mar 2023 (modified: 05 Jun 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone

Keywords: Stochastic Gradient Descent, Variance Reduction, Filtering Problem

Abstract: Stochastic gradient descent is commonly used to optimize deep neural networks, but it often produces noisy and unreliable gradient estimates that hinder convergence. To address this issue, we introduce \textbf{Filter Gradient Descent} (FGD), a family of stochastic optimization algorithms that consistently estimate the local gradient by solving an adaptive filtering problem. By incorporating historical states, FGD reduces the variance in stochastic gradient descent and improves the current estimation. We demonstrate the efficacy of FGD in numerical optimization and neural network training, where it outperforms traditional momentum-based methods in terms of robustness and performance. Code is available at \url{https://github.com/Adamdad/Filter-Gradient-Decent}.

6 Replies

Loading